varshak1/open_reward_agent_qwen3_8b_sft_v1
The varshak1/open_reward_agent_qwen3_8b_sft_v1 is an 8 billion parameter causal language model, fine-tuned from Qwen/Qwen3-8B on the open_reward_agent_sft_mix dataset. This model is designed for specific agentic tasks, leveraging its Qwen3 architecture and a 32768-token context length. Its training on a specialized dataset suggests optimization for reward agent applications.
Loading preview...
Model Overview
The varshak1/open_reward_agent_qwen3_8b_sft_v1 is an 8 billion parameter language model built upon the Qwen3-8B architecture. It has been specifically fine-tuned using the open_reward_agent_sft_mix dataset, indicating its specialization for tasks related to reward agents.
Key Characteristics
- Base Model: Qwen/Qwen3-8B
- Parameter Count: 8 billion
- Context Length: 32768 tokens
- Training Dataset: Fine-tuned on
open_reward_agent_sft_mix
Training Details
The model underwent training with the following hyperparameters:
- Learning Rate: 4e-05
- Batch Size: 1 (train), 8 (eval)
- Gradient Accumulation: 16 steps, leading to a total effective batch size of 128
- Optimizer: AdamW with betas=(0.9, 0.999) and epsilon=1e-08
- Scheduler: Cosine learning rate scheduler with 0.1 warmup steps
- Epochs: 3.0
Intended Use
While specific details on intended uses and limitations are not provided in the original documentation, the model's fine-tuning on an "open_reward_agent_sft_mix" dataset strongly suggests its utility in applications requiring reward-based agentic behavior or reinforcement learning from human feedback (RLHF) related tasks. Developers should consider its specialized training for such use cases.