laion/Sera-4.6-Lite-T2-v4-1000-axolotl__Qwen3-8B-v6

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 24, 2026Architecture:Transformer Cold

laion/Sera-4.6-Lite-T2-v4-1000-axolotl__Qwen3-8B-v6 is an 8 billion parameter language model based on the Qwen3 architecture, fine-tuned by laion using Axolotl. This model was specifically trained with an extended sequence length of 32768 tokens to improve stability and performance in long, multi-turn conversational contexts, addressing issues like degenerate token generation. It is optimized for robust handling of extensive context, making it suitable for applications requiring deep conversational memory or processing large inputs.

Loading preview...

Model Overview

laion/Sera-4.6-Lite-T2-v4-1000-axolotl__Qwen3-8B-v6 is an 8 billion parameter language model built upon the Qwen3 architecture. This model was fine-tuned by laion using the Axolotl framework, specifically addressing stability issues encountered in previous iterations (Sera v3) during long, multi-turn conversations, particularly when processing large tool observations.

Key Capabilities

  • Extended Context Handling: Trained with a sequence_len of 32768 tokens, significantly enhancing its ability to maintain coherence and stability over extensive conversational histories or large input contexts.
  • Improved Multi-Turn Stability: The training regimen, which included increasing the dataset size and number of epochs, aimed to prevent degenerate token generation (e.g., "4.4.4.4…" or "for-the-for-the…") that occurred in earlier versions with long contexts.
  • Axolotl Framework: Developed using Axolotl version 0.16.0.dev0, leveraging its capabilities for efficient large language model training.

Training Details

The model was trained for 6 epochs with a learning rate of 1e-05 and a total batch size of 32, utilizing a cosine learning rate scheduler. The training involved 109 steps with gradient accumulation steps set to 8. The base model for this fine-tuning was Qwen/Qwen3-8B.