shubhamrgandhi/qwen3-8b-full-sft-prm-opus-distill-32k-lr5e6_rejection-sample_think
The shubhamrgandhi/qwen3-8b-full-sft-prm-opus-distill-32k-lr5e6_rejection-sample_think model is an 8 billion parameter language model, fine-tuned from Qwen/Qwen3-8B. It was trained on the prm_sft_train dataset with a context length of 32768 tokens. This model is a specialized iteration focusing on specific fine-tuning objectives, though further details on its unique capabilities are not explicitly provided in the available documentation. It is intended for applications requiring a Qwen3-8B base model with this particular fine-tuning configuration.
Loading preview...
Model Overview
This model, shubhamrgandhi/qwen3-8b-full-sft-prm-opus-distill-32k-lr5e6_rejection-sample_think, is an 8 billion parameter language model derived from the Qwen/Qwen3-8B architecture. It has undergone a specific fine-tuning process on the prm_sft_train dataset.
Key Training Details
- Base Model: Qwen/Qwen3-8B
- Fine-tuning Dataset:
prm_sft_train - Context Length: 32,768 tokens
- Learning Rate: 5e-06
- Optimizer: AdamW (betas=(0.9, 0.999), epsilon=1e-08)
- Epochs: 3.0
- Batch Size: 1 (train), 8 (eval) per device, totaling 8 (train) and 64 (eval) across 8 devices.
Intended Use Cases
While specific intended uses and limitations are not detailed in the provided documentation, this model is suitable for tasks that align with the fine-tuning objectives of the prm_sft_train dataset. Developers should evaluate its performance for their specific applications, particularly those benefiting from a Qwen3-8B base with this training configuration.