rl-rag/qwen3-8B-sft-mix-v20250921

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Sep 21, 2025License:otherArchitecture:Transformer Cold

The rl-rag/qwen3-8B-sft-mix-v20250921 is an 8 billion parameter language model, fine-tuned from the Qwen/Qwen3-8B architecture. This model has been specifically trained on the rl-rag/sft-mix-v20250921 dataset, indicating an optimization for specific instruction-following or mixed-task performance. With a context length of 32768 tokens, it is designed for applications requiring processing of moderately long inputs and generating coherent, relevant outputs based on its specialized fine-tuning.

Loading preview...

Overview

This model, rl-rag/qwen3-8B-sft-mix-v20250921, is an 8 billion parameter language model derived from the Qwen3-8B base architecture. It has undergone specific fine-tuning using the rl-rag/sft-mix-v20250921 dataset, suggesting an emphasis on instruction-following or a diverse set of tasks. The model supports a substantial context length of 32768 tokens, making it suitable for processing and generating content from moderately long inputs.

Training Details

The fine-tuning process involved a learning rate of 4e-05, a total training batch size of 128 (with 16 gradient accumulation steps across 8 devices), and was conducted for 5 epochs. The optimizer used was ADAMW_TORCH with a cosine learning rate scheduler and a 0.1 warmup ratio. This configuration indicates a robust training regimen aimed at optimizing performance on its target dataset.

Intended Use

Given its fine-tuned nature and substantial context window, this model is likely intended for applications that benefit from specialized instruction-following capabilities or require processing and generating text within a significant contextual scope. Developers should consider its specific fine-tuning dataset for alignment with their use cases.