laion/sera-subset-mixed-10000-axolotl__Qwen3-8B-v8
The laion/sera-subset-mixed-10000-axolotl__Qwen3-8B-v8 is an 8 billion parameter language model, fine-tuned from Qwen/Qwen3-8B. It was trained using axolotl on a 10,000-row mixed subset of the `ethanlshen/sera-subset` dataset, specifically optimized for tasks related to the SERA recipe. This model features a 32,768 token context length and is designed for applications requiring robust performance on structured reasoning and agentic tasks.
Loading preview...
Model Overview
This model, laion/sera-subset-mixed-10000-axolotl__Qwen3-8B-v8, is an 8 billion parameter language model derived from the Qwen/Qwen3-8B architecture. It has undergone Supervised Fine-Tuning (SFT) using the axolotl framework, specifically targeting the SERA recipe for improved performance on complex reasoning tasks.
Key Training Details
The model was fine-tuned on a 10,000-row random mixed subset of the ethanlshen/sera-subset dataset, which includes both unresolved (stage1) and resolved (stage2) data. Key hyperparameters for training include:
- Learning Rate: 1e-5
- Batch Size: 32 (global)
- Epochs: 3
- Context Length: 32,768 tokens
- Chat Template: ChatML
Training utilized bf16 precision and DeepSpeed Zero3 for optimization, following the iteration i9, version v8 of the upstream SERA recipe from the open-thoughts/OpenThoughts-Agent repository.
Intended Use Cases
This model is particularly well-suited for applications that benefit from its specialized fine-tuning on the SERA dataset, which focuses on structured reasoning and agentic capabilities. Its large context window makes it suitable for processing and understanding lengthy inputs relevant to such tasks.