Overview
Model Overview
This model, mlfoundations-dev/seed_math_multiple_samples_scale_up_scaredy_cat_all, is a 7.6 billion parameter language model fine-tuned from the Qwen/Qwen2.5-7B-Instruct base model. It leverages a substantial context length of 131072 tokens, enabling it to process and generate extensive text sequences.
Key Characteristics
- Base Model: Fine-tuned from Qwen/Qwen2.5-7B-Instruct.
- Parameter Count: 7.6 billion parameters.
- Context Length: Supports a large context window of 131072 tokens.
- Training Data: Fine-tuned on the
mlfoundations-dev/seed_math_multiple_samples_scale_up_scaredy_cat_alldataset, indicating a potential focus on mathematical or complex reasoning tasks.
Training Details
The model was trained using the following key hyperparameters:
- Learning Rate: 1e-05
- Optimizer: AdamW with betas=(0.9, 0.999) and epsilon=1e-08.
- Batch Size: A total training batch size of 96 (1 per device with 12 gradient accumulation steps across 8 GPUs).
- Epochs: 3.0 epochs.
- Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio.
Potential Use Cases
Given its fine-tuning dataset and large context window, this model is likely suitable for applications requiring:
- Processing and understanding long documents or complex problem descriptions.
- Tasks involving mathematical reasoning or data analysis, depending on the specifics of the
seed_math_multiple_samples_scale_up_scaredy_cat_alldataset. - Applications where the ability to maintain coherence over extended text is crucial.