laion/Sera-4.5A-Full-T1-v3-316-axolotl__Qwen3-8B

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 22, 2026Architecture:Transformer Cold

The laion/Sera-4.5A-Full-T1-v3-316-axolotl__Qwen3-8B is an 8 billion parameter language model, fine-tuned from Qwen/Qwen3-8B. It was trained using Axolotl on the laion/Sera-4.5A-Full-T1-v3-316 dataset, featuring a substantial context length of 32768 tokens. This model is optimized for chat-based applications, leveraging a chatml template for its instruction-following capabilities.

Loading preview...

Overview

This model, laion/Sera-4.5A-Full-T1-v3-316-axolotl__Qwen3-8B, is an 8 billion parameter language model built upon the Qwen3-8B architecture. It has been fine-tuned using the Axolotl framework, specifically on the laion/Sera-4.5A-Full-T1-v3-316 dataset. A notable feature is its extensive context window of 32768 tokens, enabling it to process and generate longer, more coherent sequences.

Key Training Details

  • Base Model: Qwen/Qwen3-8B
  • Fine-tuning Framework: Axolotl (version 0.16.0.dev0)
  • Dataset: laion/Sera-4.5A-Full-T1-v3-316
  • Chat Template: Utilizes the chatml format for instruction following.
  • Learning Rate: 1e-05
  • Optimizer: AdamW with betas (0.9, 0.95)
  • Gradient Accumulation: 8 steps, resulting in a total batch size of 32.
  • Flash Attention: Enabled for improved efficiency.

Intended Use Cases

While specific intended uses and limitations are not detailed in the provided README, its fine-tuning on a chat-oriented dataset and use of the chatml template suggest suitability for:

  • Conversational AI: Developing chatbots or virtual assistants.
  • Instruction Following: Executing complex multi-turn instructions.
  • Long Context Tasks: Applications requiring understanding or generation over extended text passages, benefiting from its 32768-token context window.