laion/Sera-4.6-Lite-T2-v4-316-axolotl__Qwen3-8B-v2
laion/Sera-4.6-Lite-T2-v4-316-axolotl__Qwen3-8B-v2 is an 8 billion parameter language model fine-tuned from Qwen/Qwen3-8B by laion. This model is specifically trained on the laion/Sera-4.6-Lite-T2-v4-316 dataset, which pre-renders tool calls into content using the Hermes/Qwen3 wire format. It features a 32768 token context length and is optimized for tasks involving tool use and agentic workflows.
Loading preview...
Overview
laion/Sera-4.6-Lite-T2-v4-316-axolotl__Qwen3-8B-v2 is an 8 billion parameter model fine-tuned from the Qwen/Qwen3-8B base architecture. It was trained using Axolotl version 0.16.0.dev0 on the laion/Sera-4.6-Lite-T2-v4-316 dataset. A key aspect of its training involved ensuring the chat template used during fine-tuning precisely matched the inference-time rendering of Qwen3-8B's default template, particularly for handling multi-turn contexts and stripping <think> blocks.
Key Capabilities
- Tool Use Optimization: The model is fine-tuned on a dataset where tool calls are pre-rendered into content using the Hermes/Qwen3 wire format, suggesting a strong capability for agentic tasks and tool integration.
- Extended Context Window: Supports a
sequence_lenof 32768 tokens, enabling processing of longer inputs and more complex multi-turn interactions. - Consistent Chat Templating: Addresses a common issue in fine-tuning by aligning training-time chat template rendering with the base model's inference-time rendering, crucial for maintaining performance in multi-turn conversations.
Training Details
The model underwent training with a learning rate of 1e-05, a gradient_accumulation_steps of 8, and a micro_batch_size of 1, resulting in a total_train_batch_size of 32. It utilized the AdamW optimizer with a cosine learning rate scheduler and bf16 precision. Flash Attention was enabled during training to enhance efficiency.