laion/sera-subset-mixed-316-axolotl__Qwen3-8B-v8
The sera-subset-mixed-316-axolotl__Qwen3-8B-v8 model is an 8 billion parameter language model based on the Qwen3 architecture, fine-tuned by laion. It was trained using axolotl on a mixed subset of the `ethanlshen/sera-subset` dataset, specifically designed for instruction following and agentic tasks. This model utilizes a 32768 token context length and is optimized for performance in conversational and agent-based applications.
Loading preview...
Model Overview
This model, sera-subset-mixed-316-axolotl__Qwen3-8B-v8, is an 8 billion parameter variant of the Qwen3 architecture, developed by laion. It has undergone Supervised Fine-Tuning (SFT) using the axolotl framework. The training data consists of a 316-row mixed subset from the ethanlshen/sera-subset dataset, which includes both unresolved (stage1) and resolved (stage2) entries, indicating a focus on agentic and instruction-following capabilities.
Key Characteristics
- Base Model: Qwen3-8B
- Training Method: Supervised Fine-Tuning (SFT) with axolotl
- Dataset: A specific mixed subset of
ethanlshen/sera-subset(316 rows) - Context Length: Supports a substantial context window of 32768 tokens.
- Chat Template: Configured to use the
chatmlformat. - Hyperparameters: Trained with a learning rate of 1e-5, global batch size of 32, 3 epochs, and bf16 precision with DeepSpeed Zero3.
Intended Use Cases
This model is particularly well-suited for applications requiring:
- Instruction Following: Its training on the SERA dataset suggests proficiency in understanding and executing complex instructions.
- Agentic Tasks: The dataset's composition (stage1 unresolved + stage2 resolved) implies an optimization for agent-like reasoning and problem-solving.
- Conversational AI: The
chatmltemplate and large context window make it suitable for extended, coherent dialogues.