shubhamrgandhi/qwen3-8b-full-sft-prm-opus-distill-32k-lr5e6_rejection-sample_think

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Mar 27, 2026License:otherArchitecture:Transformer Cold

The shubhamrgandhi/qwen3-8b-full-sft-prm-opus-distill-32k-lr5e6_rejection-sample_think model is an 8 billion parameter language model, fine-tuned from Qwen/Qwen3-8B. It was trained on the prm_sft_train dataset with a context length of 32768 tokens. This model is a specialized iteration focusing on specific fine-tuning objectives, though further details on its unique capabilities are not explicitly provided in the available documentation. It is intended for applications requiring a Qwen3-8B base model with this particular fine-tuning configuration.

Loading preview...

Model Overview

This model, shubhamrgandhi/qwen3-8b-full-sft-prm-opus-distill-32k-lr5e6_rejection-sample_think, is an 8 billion parameter language model derived from the Qwen/Qwen3-8B architecture. It has undergone a specific fine-tuning process on the prm_sft_train dataset.

Key Training Details

  • Base Model: Qwen/Qwen3-8B
  • Fine-tuning Dataset: prm_sft_train
  • Context Length: 32,768 tokens
  • Learning Rate: 5e-06
  • Optimizer: AdamW (betas=(0.9, 0.999), epsilon=1e-08)
  • Epochs: 3.0
  • Batch Size: 1 (train), 8 (eval) per device, totaling 8 (train) and 64 (eval) across 8 devices.

Intended Use Cases

While specific intended uses and limitations are not detailed in the provided documentation, this model is suitable for tasks that align with the fine-tuning objectives of the prm_sft_train dataset. Developers should evaluate its performance for their specific applications, particularly those benefiting from a Qwen3-8B base with this training configuration.