shubhamrgandhi/qwen3-8b-full-sft-prm-opus-distill-32k-lr5e6_clean
This model is an 8 billion parameter Qwen3-based language model, fine-tuned by shubhamrgandhi on the prm_sft_train dataset. It features a 32k token context length and was trained with a learning rate of 5e-06. The model is a specialized adaptation of the Qwen3-8B architecture, focusing on specific performance characteristics derived from its fine-tuning process.
Loading preview...
Model Overview
This model, qwen3-8b-full-sft-prm-opus-distill-32k-lr5e6_clean, is an 8 billion parameter language model built upon the Qwen3-8B architecture. It has been specifically fine-tuned by shubhamrgandhi using the prm_sft_train dataset, indicating a specialization derived from this training data. The model supports a substantial context length of 32,768 tokens, making it suitable for processing longer inputs and generating extended outputs.
Training Details
The fine-tuning process involved a learning rate of 5e-06 and utilized a cosine learning rate scheduler with a warmup ratio of 0.1. Training was conducted over 3.0 epochs with a total batch size of 8 across 8 GPUs, employing the ADAMW_TORCH_FUSED optimizer. These parameters suggest a focused optimization strategy to adapt the base Qwen3-8B model to the characteristics of the prm_sft_train dataset.
Key Characteristics
- Base Model: Qwen3-8B
- Parameter Count: 8 billion
- Context Length: 32,768 tokens
- Fine-tuning Dataset:
prm_sft_train - Optimizer: ADAMW_TORCH_FUSED
Further details regarding the model's specific intended uses, limitations, and comprehensive evaluation data are not provided in the current documentation.