Name: laion/Sera-4.6-Lite-T2-v4-316-axolotl__Qwen3-8B-v2 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: laion

Overview

laion/Sera-4.6-Lite-T2-v4-316-axolotl__Qwen3-8B-v2 is an 8 billion parameter model fine-tuned from the Qwen/Qwen3-8B base architecture. It was trained using Axolotl version 0.16.0.dev0 on the laion/Sera-4.6-Lite-T2-v4-316 dataset. A key aspect of its training involved ensuring the chat template used during fine-tuning precisely matched the inference-time rendering of Qwen3-8B's default template, particularly for handling multi-turn contexts and stripping <think> blocks.

Key Capabilities

Tool Use Optimization: The model is fine-tuned on a dataset where tool calls are pre-rendered into content using the Hermes/Qwen3 wire format, suggesting a strong capability for agentic tasks and tool integration.
Extended Context Window: Supports a sequence_len of 32768 tokens, enabling processing of longer inputs and more complex multi-turn interactions.
Consistent Chat Templating: Addresses a common issue in fine-tuning by aligning training-time chat template rendering with the base model's inference-time rendering, crucial for maintaining performance in multi-turn conversations.

Training Details

The model underwent training with a learning rate of 1e-05, a gradient_accumulation_steps of 8, and a micro_batch_size of 1, resulting in a total_train_batch_size of 32. It utilized the AdamW optimizer with a cosine learning rate scheduler and bf16 precision. Flash Attention was enabled during training to enhance efficiency.

Overview

Overview

Key Capabilities

Training Details

Full Model Card (README)