laion/Sera-4.5A-Full-T1-v3-1000-axolotl__Qwen3-8B
laion/Sera-4.5A-Full-T1-v3-1000-axolotl__Qwen3-8B is an 8 billion parameter language model fine-tuned from Qwen/Qwen3-8B by laion. This model was trained using the axolotl framework on the laion/Sera-4.5A-Full-T1-v3-1000 dataset, featuring a substantial context length of 32768 tokens. It is designed for general language generation tasks, leveraging its Qwen3 base and extensive fine-tuning data.
Loading preview...
Model Overview
laion/Sera-4.5A-Full-T1-v3-1000-axolotl__Qwen3-8B is an 8 billion parameter language model, fine-tuned from the base model Qwen/Qwen3-8B. This model was developed by laion using the axolotl framework, specifically version 0.16.0.dev0.
Key Characteristics
- Base Model: Qwen/Qwen3-8B
- Parameter Count: 8 billion
- Context Length: Supports a sequence length of 32768 tokens, enabling processing of long inputs.
- Training Data: Fine-tuned on the
laion/Sera-4.5A-Full-T1-v3-1000dataset, which is a JSONL dataset with messages formatted for chat templates. - Training Configuration: Utilizes
bf16precision,flash_attention, andgradient_checkpointingfor efficient training. - Optimizer: Trained with
adamw_torchoptimizer, a learning rate of1e-05, and acosinelearning rate scheduler.
Intended Use Cases
This model is suitable for general-purpose language generation and understanding tasks, benefiting from its large context window and fine-tuning on a diverse dataset. Its Qwen3 base provides a strong foundation for various applications requiring robust language capabilities.