mlfoundations-dev/stratos_new_verified_mix_sharegptformat_4nodes

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kLicense:apache-2.0Architecture:Transformer Open Weights Cold

The mlfoundations-dev/stratos_new_verified_mix_sharegptformat_4nodes is a 7.6 billion parameter language model fine-tuned from Qwen/Qwen2.5-7B-Instruct. This model is specifically adapted using the mlfoundations-dev/stratos_new_verified_mix_sharegptformat dataset. It is designed for general instruction-following tasks, leveraging its base architecture for broad applicability.

Loading preview...

Model Overview

The mlfoundations-dev/stratos_new_verified_mix_sharegptformat_4nodes is a 7.6 billion parameter language model, fine-tuned from the robust Qwen/Qwen2.5-7B-Instruct base model. This adaptation utilizes the mlfoundations-dev/stratos_new_verified_mix_sharegptformat dataset, suggesting a specialization in instruction-following tasks within a ShareGPT-like format.

Key Characteristics

  • Base Model: Qwen/Qwen2.5-7B-Instruct
  • Parameter Count: 7.6 billion parameters
  • Context Length: 131,072 tokens
  • Fine-tuning Dataset: mlfoundations-dev/stratos_new_verified_mix_sharegptformat

Training Details

The model was trained with a learning rate of 1e-05, a total batch size of 96 (achieved with a train_batch_size of 1 and gradient_accumulation_steps of 3 across 32 devices), and optimized using AdamW with cosine learning rate scheduling over 3 epochs. The training leveraged Transformers 4.46.1 and Pytorch 2.3.0.

Intended Use Cases

This model is suitable for applications requiring a capable instruction-following LLM, particularly those benefiting from its Qwen2.5-7B-Instruct foundation and specialized fine-tuning. Its large context window of 131,072 tokens makes it well-suited for processing and generating long-form content or complex multi-turn conversations.