mlfoundations-dev/fasttext_mixing_domains_top_3_code
The mlfoundations-dev/fasttext_mixing_domains_top_3_code model is a 7.6 billion parameter language model fine-tuned from Qwen/Qwen2.5-7B-Instruct. It was trained on the mlfoundations-dev/fasttext_mixing_domains_top_3_code dataset, suggesting an optimization for tasks related to code or domain-specific text processing. This model leverages a 131072 token context length, making it suitable for handling extensive inputs in its specialized domain.
Loading preview...
Model Overview
This model, fasttext_mixing_domains_top_3_code, is a fine-tuned variant of the Qwen/Qwen2.5-7B-Instruct base model. It features 7.6 billion parameters and supports a substantial context length of 131072 tokens, enabling it to process very long sequences of text.
Training Details
The model was fine-tuned on the mlfoundations-dev/fasttext_mixing_domains_top_3_code dataset. Key training hyperparameters included:
- Learning Rate: 1e-05
- Optimizer: ADAMW_TORCH with betas=(0.9, 0.999) and epsilon=1e-08
- Batch Size: A total training batch size of 96 (1 per device across 8 GPUs with 12 gradient accumulation steps)
- Epochs: 3.0
- LR Scheduler: Cosine with a 0.1 warmup ratio
Potential Use Cases
Given its fine-tuning on a dataset related to "mixing domains" and "code," this model is likely optimized for:
- Processing and understanding code-related text.
- Tasks involving the integration or analysis of information from diverse domains, particularly those with a coding component.
Further details on specific intended uses, limitations, and comprehensive training/evaluation data are noted as needing more information in the original model card.