The rl-rag/qwen3-8B-sft-mix-v20250921-plus-v20251001-onpolicy-rs-longform_0921 is an 8 billion parameter language model, fine-tuned from rl-rag/qwen3-8B-sft-mix-v20250921. It features a 32,768 token context length and was trained using a specific on-policy reinforcement learning dataset for long-form generation. This model is optimized for tasks requiring extended coherent text output, leveraging its specialized fine-tuning for improved performance in such applications.
Loading preview...
Model Overview
This model, qwen3-8B-sft-mix-v20250921-plus-v20251001-onpolicy-rs-longform_0921, is an 8 billion parameter language model with a substantial 32,768 token context length. It is a fine-tuned iteration of the rl-rag/qwen3-8B-sft-mix-v20250921 base model.
Key Characteristics
- Base Model: Fine-tuned from
rl-rag/qwen3-8B-sft-mix-v20250921. - Context Length: Supports a long context window of 32,768 tokens, enabling processing and generation of extensive text.
- Specialized Fine-tuning: The model underwent specific fine-tuning on the
rl-rag/sft-mix-v20251001-onpolicy-rs-longform_0921dataset. This indicates an optimization for tasks involving long-form content generation, likely leveraging on-policy reinforcement learning strategies.
Training Details
The training procedure involved:
- Learning Rate: 4e-05
- Batch Size: A
train_batch_sizeof 1 withgradient_accumulation_stepsof 16, resulting in atotal_train_batch_sizeof 128. - Optimizer: AdamW with betas=(0.9, 0.999) and epsilon=1e-08.
- Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio.
- Epochs: Trained for 5 epochs.
- Frameworks: Utilized Transformers 4.52.4, Pytorch 2.8.0+cu128, Datasets 3.6.0, and Tokenizers 0.21.1.
Potential Use Cases
Given its fine-tuning on a long-form dataset, this model is likely well-suited for applications requiring:
- Extended Text Generation: Creating detailed articles, reports, stories, or other lengthy documents.
- Summarization of Long Documents: Processing and summarizing very long texts due to its large context window.
- Complex Question Answering: Answering questions that require synthesizing information from extensive source material.