laion/Sera-4.5A-Full-T1-v3-316-axolotl__Qwen3-8B
The laion/Sera-4.5A-Full-T1-v3-316-axolotl__Qwen3-8B is an 8 billion parameter language model, fine-tuned from Qwen/Qwen3-8B. It was trained using Axolotl on the laion/Sera-4.5A-Full-T1-v3-316 dataset, featuring a substantial context length of 32768 tokens. This model is optimized for chat-based applications, leveraging a chatml template for its instruction-following capabilities.
Loading preview...
Overview
This model, laion/Sera-4.5A-Full-T1-v3-316-axolotl__Qwen3-8B, is an 8 billion parameter language model built upon the Qwen3-8B architecture. It has been fine-tuned using the Axolotl framework, specifically on the laion/Sera-4.5A-Full-T1-v3-316 dataset. A notable feature is its extensive context window of 32768 tokens, enabling it to process and generate longer, more coherent sequences.
Key Training Details
- Base Model: Qwen/Qwen3-8B
- Fine-tuning Framework: Axolotl (version
0.16.0.dev0) - Dataset:
laion/Sera-4.5A-Full-T1-v3-316 - Chat Template: Utilizes the
chatmlformat for instruction following. - Learning Rate:
1e-05 - Optimizer: AdamW with betas
(0.9, 0.95) - Gradient Accumulation: 8 steps, resulting in a total batch size of 32.
- Flash Attention: Enabled for improved efficiency.
Intended Use Cases
While specific intended uses and limitations are not detailed in the provided README, its fine-tuning on a chat-oriented dataset and use of the chatml template suggest suitability for:
- Conversational AI: Developing chatbots or virtual assistants.
- Instruction Following: Executing complex multi-turn instructions.
- Long Context Tasks: Applications requiring understanding or generation over extended text passages, benefiting from its 32768-token context window.