shubhamrgandhi/qwen3-8b-full-sft-prm-r2egym-swebench-k5-opus-distill-32k-lr5e6-multiturn

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:May 16, 2026License:otherArchitecture:Transformer Warm

The shubhamrgandhi/qwen3-8b-full-sft-prm-r2egym-swebench-k5-opus-distill-32k-lr5e6-multiturn model is an 8 billion parameter language model, fine-tuned from Qwen/Qwen3-8B. It was trained on the prm_sft_train dataset with a 32k context length. This model is specifically optimized for tasks related to its fine-tuning data, suggesting potential strengths in areas covered by that dataset.

Loading preview...

Overview

This model, shubhamrgandhi/qwen3-8b-full-sft-prm-r2egym-swebench-k5-opus-distill-32k-lr5e6-multiturn, is an 8 billion parameter language model derived from the Qwen3-8B architecture. It has been fine-tuned using the prm_sft_train dataset, indicating a specialization towards the content and style of that specific training data. The model supports a substantial context length of 32,768 tokens, allowing it to process and generate longer sequences of text.

Training Details

The fine-tuning process involved a learning rate of 5e-06 and was conducted over 3.0 epochs. It utilized a multi-GPU setup with 8 devices, employing the AdamW optimizer with cosine learning rate scheduling and a warmup ratio of 0.1. The training environment included Transformers 4.57.6, Pytorch 2.9.1+cu128, Datasets 4.0.0, and Tokenizers 0.22.2.

Potential Use Cases

Given its fine-tuning on the prm_sft_train dataset, this model is likely best suited for applications that align with the characteristics and domain of that specific data. Developers should evaluate its performance on tasks similar to the fine-tuning objective to determine its suitability for their particular use case.