Model Overview
The mlfoundations-dev/stratos_new_verified_mix_sharegptformat_4nodes is a 7.6 billion parameter language model, fine-tuned from the robust Qwen/Qwen2.5-7B-Instruct base model. This adaptation utilizes the mlfoundations-dev/stratos_new_verified_mix_sharegptformat dataset, suggesting a specialization in instruction-following tasks within a ShareGPT-like format.
Key Characteristics
- Base Model: Qwen/Qwen2.5-7B-Instruct
- Parameter Count: 7.6 billion parameters
- Context Length: 131,072 tokens
- Fine-tuning Dataset: mlfoundations-dev/stratos_new_verified_mix_sharegptformat
Training Details
The model was trained with a learning rate of 1e-05, a total batch size of 96 (achieved with a train_batch_size of 1 and gradient_accumulation_steps of 3 across 32 devices), and optimized using AdamW with cosine learning rate scheduling over 3 epochs. The training leveraged Transformers 4.46.1 and Pytorch 2.3.0.
Intended Use Cases
This model is suitable for applications requiring a capable instruction-following LLM, particularly those benefiting from its Qwen2.5-7B-Instruct foundation and specialized fine-tuning. Its large context window of 131,072 tokens makes it well-suited for processing and generating long-form content or complex multi-turn conversations.