shubhamrgandhi/qwen3-8b-full-sft-prm-opus-distill-32k-lr5e6-flattened

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 28, 2026License:otherArchitecture:Transformer Cold

The shubhamrgandhi/qwen3-8b-full-sft-prm-opus-distill-32k-lr5e6-flattened model is an 8 billion parameter language model, fine-tuned from Qwen/Qwen3-8B. It was trained on the prm_sft_train dataset with a 32k context length. This model is specifically optimized through supervised fine-tuning (SFT) and Proximal Policy Optimization (PPO) for improved performance on specific tasks, making it suitable for applications requiring a robust, instruction-following model.

Loading preview...

Model Overview

This model, qwen3-8b-full-sft-prm-opus-distill-32k-lr5e6-flattened, is an 8 billion parameter language model derived from the Qwen/Qwen3-8B architecture. It has been fine-tuned using a combination of supervised fine-tuning (SFT) and Proximal Policy Optimization (PPO) on the prm_sft_train dataset, with a notable context length of 32,768 tokens.

Training Details

The fine-tuning process involved specific hyperparameters to achieve its current state:

  • Learning Rate: 5e-06
  • Batch Size: 1 (train), 8 (eval)
  • Optimizer: ADAMW_TORCH_FUSED
  • Scheduler: Cosine with 0.1 warmup ratio
  • Epochs: 3.0

This configuration, utilizing 8 GPUs, aimed to enhance the model's capabilities for instruction-following and general language understanding within its substantial context window.

Intended Use Cases

Given its fine-tuning methodology, this model is likely suitable for:

  • Instruction Following: Excelling in tasks where precise adherence to prompts is critical.
  • Long Context Applications: Handling and generating coherent text over extended inputs, up to 32k tokens.
  • General Language Tasks: Performing well in a variety of natural language processing applications due to its Qwen3-8B base and subsequent fine-tuning.