mlfoundations-dev/test_tacc_stratos_verified_mix

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Apr 1, 2025License:apache-2.0Architecture:Transformer Open Weights Cold

The mlfoundations-dev/test_tacc_stratos_verified_mix is a 7.6 billion parameter language model, fine-tuned from Qwen/Qwen2.5-7B-Instruct by mlfoundations-dev. It was trained on the mlfoundations-dev/stratos_verified_mix dataset with a context length of 32768 tokens. This model is designed for general language understanding and generation tasks, leveraging its base architecture and specialized fine-tuning data.

Loading preview...

Overview

This model, test_tacc_stratos_verified_mix, is a 7.6 billion parameter language model derived from the Qwen/Qwen2.5-7B-Instruct architecture. It has been specifically fine-tuned by mlfoundations-dev using the mlfoundations-dev/stratos_verified_mix dataset, indicating a specialization towards the characteristics of this particular data mix. The model supports a substantial context length of 32768 tokens.

Key Capabilities

  • General Language Understanding: Inherits robust language comprehension from its Qwen2.5-7B-Instruct base.
  • Contextual Processing: Capable of handling long inputs and maintaining coherence over 32768 tokens.
  • Specialized Adaptation: Fine-tuned on a unique dataset, suggesting potential strengths in areas covered by stratos_verified_mix.

Training Details

The model was trained with a learning rate of 8e-05, a total batch size of 512 (achieved with a train_batch_size of 1 and gradient_accumulation_steps of 16 across 32 GPUs), and for 3 epochs. It utilized the AdamW_Torch optimizer with a cosine learning rate scheduler and a warmup ratio of 0.1. The training leveraged Transformers 4.46.1 and Pytorch 2.5.1.