mlfoundations-dev/DCFT-Stratos-Verified-114k-32B-4gpus
Hugging Face
TEXT GENERATIONConcurrency Cost:2Model Size:32.8BQuant:FP8Ctx Length:32kPublished:Jan 27, 2025License:apache-2.0Architecture:Transformer Open Weights Warm

DCFT-Stratos-Verified-114k-32B-4gpus is a 32 billion parameter language model, fine-tuned from Qwen/Qwen2.5-32B-Instruct by mlfoundations-dev. This model is specifically trained on the mlfoundations-dev/stratos_verified_mix dataset. It is designed for tasks benefiting from further instruction-tuning on a specialized dataset, building upon the Qwen2.5 architecture.

Loading preview...

Overview

This model, DCFT-Stratos-Verified-114k-32B-4gpus, is a fine-tuned iteration of the Qwen/Qwen2.5-32B-Instruct base model. Developed by mlfoundations-dev, it leverages the robust capabilities of the 32 billion parameter Qwen2.5 architecture.

Key Characteristics

  • Base Model: Qwen/Qwen2.5-32B-Instruct.
  • Fine-tuning Dataset: mlfoundations-dev/stratos_verified_mix.
  • Training Hyperparameters: Utilized a learning rate of 1e-05, a total training batch size of 96, and trained for 3 epochs. The optimizer was ADAMW_TORCH with a cosine learning rate scheduler.

Intended Use Cases

This model is suitable for applications requiring a specialized instruction-tuned model, particularly those that align with the characteristics of the stratos_verified_mix dataset. Users should consider its foundation on Qwen2.5-32B-Instruct for general language understanding and generation tasks, with enhanced performance on data similar to its fine-tuning corpus.