yikeee/Open-Reward-Agent-sft-rubric-only
The yikeee/Open-Reward-Agent-sft-rubric-only is an 8 billion parameter language model fine-tuned from Qwen/Qwen3-8B. This model is specifically trained on the open_reward_agent_rubric_sft_mix dataset, indicating an optimization for tasks related to reward agent rubrics. It features a 32768 token context length, making it suitable for processing extensive inputs in its specialized domain.
Loading preview...
Model Overview
The yikeee/Open-Reward-Agent-sft-rubric-only is an 8 billion parameter language model, fine-tuned from the Qwen/Qwen3-8B architecture. Its training specifically utilized the open_reward_agent_rubric_sft_mix dataset, suggesting a specialization in tasks involving reward agent rubrics.
Key Characteristics
- Base Model: Qwen/Qwen3-8B
- Parameter Count: 8 billion
- Context Length: 32768 tokens, enabling the processing of substantial input texts.
- Specialized Training: Fine-tuned on a dataset focused on reward agent rubrics, indicating potential for enhanced performance in related applications.
Training Details
The model was trained with a learning rate of 4e-05, a total batch size of 64 (achieved with a train_batch_size of 1 and gradient_accumulation_steps of 16), and for 3 epochs. The optimizer used was ADAMW_TORCH_FUSED with cosine learning rate scheduling. This configuration suggests a focused effort to adapt the base Qwen3-8B model to specific rubric-based tasks.
Intended Use Cases
While specific intended uses and limitations are not detailed in the provided information, the model's fine-tuning on a reward agent rubric dataset implies its utility in applications requiring the understanding, generation, or evaluation of content based on predefined rubrics, particularly within the context of reward systems or agent behavior assessment.