Name: Nirbhayhero07/deepsentinel-overseer-small API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Nirbhayhero07

Overview

Nirbhayhero07/deepsentinel-overseer-small is a 0.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-0.5B-Instruct base model. It was developed by Nirbhayhero07 and trained using the TRL library.

Key Capabilities

Enhanced Reasoning: The model incorporates the GRPO (Gradient-based Reasoning Policy Optimization) training method, as introduced in the DeepSeekMath paper, which is designed to push the limits of mathematical reasoning in open language models.
Instruction Following: As an instruction-tuned model, it is capable of understanding and executing user prompts effectively.
Extended Context Window: Supports a context length of 32768 tokens, allowing for processing longer inputs and maintaining conversational coherence over extended interactions.

Training Details

The model was fine-tuned using TRL (Transformers Reinforcement Learning) and the GRPO method. This approach aims to improve the model's ability to handle complex reasoning tasks, particularly those with a mathematical foundation. The training utilized specific versions of frameworks including TRL 1.2.0, Transformers 5.0.0, Pytorch 2.10.0+cu128, Datasets 4.8.4, and Tokenizers 0.22.2.

Good For

This model is particularly suitable for applications requiring strong reasoning abilities, especially in mathematical problem-solving or tasks where logical deduction is crucial, benefiting from its GRPO-enhanced training.

Overview

Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)