laion/Sera-4.6-Lite-T2-v4-316-axolotl__Qwen3-8B

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 23, 2026Architecture:Transformer Cold

laion/Sera-4.6-Lite-T2-v4-316-axolotl__Qwen3-8B is an 8 billion parameter language model fine-tuned from Qwen/Qwen3-8B. It was trained using Axolotl on the laion/Sera-4.6-Lite-T2-v4-316 dataset, which incorporates pre-rendered tool calls in the Hermes/Qwen3 wire format. This model is designed for tasks involving tool use and structured interactions, leveraging a 32768 token context length.

Loading preview...

Model Overview

This model, laion/Sera-4.6-Lite-T2-v4-316-axolotl__Qwen3-8B, is an 8 billion parameter language model built upon the Qwen/Qwen3-8B architecture. It has been fine-tuned using the Axolotl framework, specifically leveraging the laion/Sera-4.6-Lite-T2-v4-316 dataset. A key characteristic of its training data is the inclusion of pre-rendered tool calls, formatted according to the Hermes/Qwen3 wire format, which are integrated directly into the content.

Key Capabilities

  • Tool Call Integration: Trained with tool calls pre-rendered into the input, suggesting proficiency in understanding and potentially generating structured interactions or function calls.
  • Qwen3-8B Base: Benefits from the strong foundational capabilities of the Qwen3-8B model.
  • Extended Context Window: Features a sequence_len of 32768 tokens, enabling processing of longer inputs and maintaining context over extended dialogues or documents.

Training Details

The model was trained with a learning rate of 1e-05, using the AdamW optimizer with specific beta parameters. It utilized a total batch size of 32 across 4 GPUs, with gradient accumulation steps set to 8. The training involved 17 steps over 3 epochs, with a cosine learning rate scheduler and a warmup ratio of 0.1875.

Good For

  • Applications requiring tool use or function calling capabilities.
  • Tasks benefiting from a large context window.
  • Developers looking for a fine-tuned Qwen3-8B variant with specialized training for structured outputs.