mlfoundations-dev/stratos_unverified_mix_2nodes

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kLicense:apache-2.0Architecture:Transformer Open Weights Cold

The mlfoundations-dev/stratos_unverified_mix_2nodes model is a 7.6 billion parameter language model fine-tuned from Qwen/Qwen2.5-7B-Instruct. Developed by mlfoundations-dev, it leverages a 131,072-token context length. This model is specifically fine-tuned on the mlfoundations-dev/stratos_unverified_mix dataset, indicating a specialization for tasks related to the characteristics of that dataset.

Loading preview...

Model Overview

The stratos_unverified_mix_2nodes model is a 7.6 billion parameter language model developed by mlfoundations-dev. It is a fine-tuned variant of the robust Qwen/Qwen2.5-7B-Instruct architecture, designed to leverage its strong base capabilities. This model is distinguished by its training on the mlfoundations-dev/stratos_unverified_mix dataset, suggesting a focus on specific data characteristics or tasks represented within that dataset.

Key Characteristics

  • Base Model: Fine-tuned from Qwen/Qwen2.5-7B-Instruct.
  • Parameter Count: 7.6 billion parameters.
  • Context Length: Supports a substantial context window of 131,072 tokens.
  • Training Data: Specialized training on the mlfoundations-dev/stratos_unverified_mix dataset.

Training Details

The model was trained using a learning rate of 1e-05, a total batch size of 96 (with 16 devices and 6 gradient accumulation steps), and a cosine learning rate scheduler with a 0.1 warmup ratio over 3 epochs. The training utilized Transformers 4.46.1, Pytorch 2.3.0, Datasets 3.1.0, and Tokenizers 0.20.3.