laion/Sera-4.6-Lite-T2-v4-1000-axolotl__Qwen3-8B-v7

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 25, 2026Architecture:Transformer Cold

laion/Sera-4.6-Lite-T2-v4-1000-axolotl__Qwen3-8B-v7 is an 8 billion parameter language model fine-tuned from Qwen/Qwen3-8B with a 32768 token context length. This model was trained using Axolotl on the laion/Sera-4.6-Lite-T2-v4-1000 dataset, specifically addressing issues with long multi-turn contexts and tool observations exceeding 20KB. It is optimized for stability in extended conversational interactions where previous versions experienced token degeneration.

Loading preview...

Model Overview

This model, laion/Sera-4.6-Lite-T2-v4-1000-axolotl__Qwen3-8B-v7, is an 8 billion parameter language model fine-tuned from the Qwen/Qwen3-8B base model. It was developed using the Axolotl framework, with a focus on improving performance in long, multi-turn conversational contexts.

Key Differentiator

The primary goal of this fine-tuning was to resolve issues observed in previous Sera versions, where the model's output would degrade (e.g., "4.4.4.4…" or "for-the-for-the…") when encountering tool observations larger than approximately 20KB within a multi-turn conversation. The training dataset was scaled up significantly (from 316 to 1000 rows) and the number of epochs increased to enhance the model's stability and prevent token degeneration in extended contexts.

Training Details

  • Base Model: Qwen/Qwen3-8B
  • Dataset: laion/Sera-4.6-Lite-T2-v4-1000 (a JSONL dataset)
  • Sequence Length: 32768 tokens
  • Learning Rate: 1e-05
  • Optimizer: AdamW with betas=(0.9, 0.95)
  • Epochs: 12
  • Gradient Accumulation Steps: 8
  • Total Training Steps: 218

Intended Use Cases

This model is particularly suited for applications requiring stable and coherent responses in long, multi-turn dialogues, especially those involving the integration of large tool observations or complex contextual information. Its enhanced stability in extended contexts makes it a robust choice for agents or conversational AI systems that handle detailed interactions.