laion/Sera-4.6-Lite-T2-v4-316-axolotl__Qwen3-8B-v3
laion/Sera-4.6-Lite-T2-v4-316-axolotl__Qwen3-8B-v3 is an 8 billion parameter language model based on the Qwen3 architecture, fine-tuned with Axolotl. It features a 32,768 token context length and is specifically optimized for structured output tasks, particularly those involving nested JSON structures and tool calls, addressing issues like malformed JSON generation. This model is designed for applications requiring precise and reliable structured data generation.
Loading preview...
Model Overview
laion/Sera-4.6-Lite-T2-v4-316-axolotl__Qwen3-8B-v3 is an 8 billion parameter model built upon the Qwen3-8B base architecture. It was fine-tuned using the Axolotl framework, specifically targeting improvements in structured output generation, particularly for nested JSON and tool-calling scenarios. The model addresses previous issues with malformed JSON and collapsed argument structures.
Key Training Details
- Base Model: Qwen/Qwen3-8B
- Context Length: 32,768 tokens
- Training Framework: Axolotl (version 0.16.0.dev0)
- Dataset: laion/Sera-4.6-Lite-T2-v4-316, a chat-template dataset focusing on messages with tool calls.
- Epochs: Trained for 6 epochs, an increase from 3 epochs in previous versions, to better capture complex data structures.
- Learning Rate: 1e-5 with a cosine scheduler.
- Optimization: Utilizes
adamw_torchoptimizer withgradient_accumulation_stepsof 8.
Intended Use Cases
This model is particularly suited for applications requiring:
- Reliable Structured Output: Generating well-formed, nested JSON structures.
- Tool Calling: Executing and interpreting tool calls with accurate argument parsing.
- Complex Instruction Following: Handling intricate instructions that involve structured data manipulation.