mlfoundations-dev/test_tacc_stratos_verified_mix
The mlfoundations-dev/test_tacc_stratos_verified_mix is a 7.6 billion parameter language model, fine-tuned from Qwen/Qwen2.5-7B-Instruct by mlfoundations-dev. It was trained on the mlfoundations-dev/stratos_verified_mix dataset with a context length of 32768 tokens. This model is designed for general language understanding and generation tasks, leveraging its base architecture and specialized fine-tuning data.
Loading preview...
Overview
This model, test_tacc_stratos_verified_mix, is a 7.6 billion parameter language model derived from the Qwen/Qwen2.5-7B-Instruct architecture. It has been specifically fine-tuned by mlfoundations-dev using the mlfoundations-dev/stratos_verified_mix dataset, indicating a specialization towards the characteristics of this particular data mix. The model supports a substantial context length of 32768 tokens.
Key Capabilities
- General Language Understanding: Inherits robust language comprehension from its Qwen2.5-7B-Instruct base.
- Contextual Processing: Capable of handling long inputs and maintaining coherence over 32768 tokens.
- Specialized Adaptation: Fine-tuned on a unique dataset, suggesting potential strengths in areas covered by
stratos_verified_mix.
Training Details
The model was trained with a learning rate of 8e-05, a total batch size of 512 (achieved with a train_batch_size of 1 and gradient_accumulation_steps of 16 across 32 GPUs), and for 3 epochs. It utilized the AdamW_Torch optimizer with a cosine learning rate scheduler and a warmup ratio of 0.1. The training leveraged Transformers 4.46.1 and Pytorch 2.5.1.