mlfoundations-dev/DCFT-Stratos-Verified-114k-7B-4gpus-systemprompt-packing
The mlfoundations-dev/DCFT-Stratos-Verified-114k-7B-4gpus-systemprompt-packing model is a 7.6 billion parameter instruction-tuned causal language model, fine-tuned from Qwen/Qwen2.5-7B-Instruct. It was trained on the mlfoundations-dev/stratos_verified_mix dataset. This model is optimized for tasks requiring robust instruction following and system prompt adherence, leveraging its fine-tuning on a verified mixed dataset.
Loading preview...
Model Overview
This model, named DCFT-Stratos-Verified-114k-7B-4gpus-systemprompt-packing, is a 7.6 billion parameter language model. It is a fine-tuned variant of the Qwen/Qwen2.5-7B-Instruct architecture, developed by mlfoundations-dev.
Key Characteristics
- Base Model: Fine-tuned from Qwen/Qwen2.5-7B-Instruct.
- Training Data: Utilizes the
mlfoundations-dev/stratos_verified_mixdataset for its fine-tuning process. - Training Configuration: Trained with a learning rate of 1e-05, a total batch size of 96, and 3 epochs, using a multi-GPU setup with 32 devices.
- Context Length: Features a context length of 131,072 tokens.
Intended Use Cases
While specific intended uses and limitations are not detailed in the provided information, its fine-tuning on a 'verified mix' dataset suggests potential strengths in:
- Instruction Following: Likely enhanced capabilities in adhering to complex instructions.
- System Prompt Adherence: Optimized for scenarios where system prompts play a critical role in guiding model behavior.
Further details on specific applications and performance metrics would require additional information from the model developers.