shubhamrgandhi/qwen3-8b-full-sft-prm-opus-distill-32k-lr5e6-multiturn
The shubhamrgandhi/qwen3-8b-full-sft-prm-opus-distill-32k-lr5e6-multiturn model is an 8 billion parameter language model, fine-tuned from Qwen/Qwen3-8B. It was trained on the prm_sft_train dataset with a 32K context length. This model is optimized for multi-turn conversational tasks, leveraging a specific learning rate and training procedure.
Loading preview...
Model Overview
This model, shubhamrgandhi/qwen3-8b-full-sft-prm-opus-distill-32k-lr5e6-multiturn, is an 8 billion parameter language model derived from the Qwen/Qwen3-8B architecture. It has been specifically fine-tuned using the prm_sft_train dataset, indicating an optimization for specific instruction-following or conversational tasks.
Key Training Details
- Base Model: Qwen/Qwen3-8B
- Fine-tuning Dataset:
prm_sft_train - Context Length: 32,768 tokens
- Learning Rate: 5e-06
- Optimizer: AdamW (fused) with specific beta and epsilon values
- Scheduler: Cosine learning rate scheduler with 0.1 warmup ratio
- Epochs: 3.0
Potential Use Cases
Given its fine-tuning on a specific SFT (Supervised Fine-Tuning) dataset and its multi-turn designation, this model is likely suitable for:
- Multi-turn dialogue systems
- Instruction-following applications
- Chatbot development
Further details on intended uses, limitations, and specific evaluation results are not provided in the original model card.