rl-rag/qwen3-8B-sft-mix-v20250921-plus-v20251001-onpolicy-rs-longform_0921
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Oct 6, 2025License:otherArchitecture:Transformer Cold

The rl-rag/qwen3-8B-sft-mix-v20250921-plus-v20251001-onpolicy-rs-longform_0921 is an 8 billion parameter language model, fine-tuned from rl-rag/qwen3-8B-sft-mix-v20250921. It features a 32,768 token context length and was trained using a specific on-policy reinforcement learning dataset for long-form generation. This model is optimized for tasks requiring extended coherent text output, leveraging its specialized fine-tuning for improved performance in such applications.

Loading preview...

Model Overview

This model, qwen3-8B-sft-mix-v20250921-plus-v20251001-onpolicy-rs-longform_0921, is an 8 billion parameter language model with a substantial 32,768 token context length. It is a fine-tuned iteration of the rl-rag/qwen3-8B-sft-mix-v20250921 base model.

Key Characteristics

  • Base Model: Fine-tuned from rl-rag/qwen3-8B-sft-mix-v20250921.
  • Context Length: Supports a long context window of 32,768 tokens, enabling processing and generation of extensive text.
  • Specialized Fine-tuning: The model underwent specific fine-tuning on the rl-rag/sft-mix-v20251001-onpolicy-rs-longform_0921 dataset. This indicates an optimization for tasks involving long-form content generation, likely leveraging on-policy reinforcement learning strategies.

Training Details

The training procedure involved:

  • Learning Rate: 4e-05
  • Batch Size: A train_batch_size of 1 with gradient_accumulation_steps of 16, resulting in a total_train_batch_size of 128.
  • Optimizer: AdamW with betas=(0.9, 0.999) and epsilon=1e-08.
  • Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio.
  • Epochs: Trained for 5 epochs.
  • Frameworks: Utilized Transformers 4.52.4, Pytorch 2.8.0+cu128, Datasets 3.6.0, and Tokenizers 0.21.1.

Potential Use Cases

Given its fine-tuning on a long-form dataset, this model is likely well-suited for applications requiring:

  • Extended Text Generation: Creating detailed articles, reports, stories, or other lengthy documents.
  • Summarization of Long Documents: Processing and summarizing very long texts due to its large context window.
  • Complex Question Answering: Answering questions that require synthesizing information from extensive source material.