shubhamrgandhi/qwen3-8b-full-sft-prm-opus-distill-32k-lr5e6-flattened
The shubhamrgandhi/qwen3-8b-full-sft-prm-opus-distill-32k-lr5e6-flattened model is an 8 billion parameter language model, fine-tuned from Qwen/Qwen3-8B. It was trained on the prm_sft_train dataset with a 32k context length. This model is specifically optimized through supervised fine-tuning (SFT) and Proximal Policy Optimization (PPO) for improved performance on specific tasks, making it suitable for applications requiring a robust, instruction-following model.
Loading preview...
Model Overview
This model, qwen3-8b-full-sft-prm-opus-distill-32k-lr5e6-flattened, is an 8 billion parameter language model derived from the Qwen/Qwen3-8B architecture. It has been fine-tuned using a combination of supervised fine-tuning (SFT) and Proximal Policy Optimization (PPO) on the prm_sft_train dataset, with a notable context length of 32,768 tokens.
Training Details
The fine-tuning process involved specific hyperparameters to achieve its current state:
- Learning Rate: 5e-06
- Batch Size: 1 (train), 8 (eval)
- Optimizer: ADAMW_TORCH_FUSED
- Scheduler: Cosine with 0.1 warmup ratio
- Epochs: 3.0
This configuration, utilizing 8 GPUs, aimed to enhance the model's capabilities for instruction-following and general language understanding within its substantial context window.
Intended Use Cases
Given its fine-tuning methodology, this model is likely suitable for:
- Instruction Following: Excelling in tasks where precise adherence to prompts is critical.
- Long Context Applications: Handling and generating coherent text over extended inputs, up to 32k tokens.
- General Language Tasks: Performing well in a variety of natural language processing applications due to its Qwen3-8B base and subsequent fine-tuning.