mlfoundations-dev/seed_math_multiple_samples_scale_up_scaredy_cat_all

Warm
Public
7.6B
FP8
32768
License: apache-2.0
Hugging Face
Overview

Model Overview

This model, mlfoundations-dev/seed_math_multiple_samples_scale_up_scaredy_cat_all, is a 7.6 billion parameter language model fine-tuned from the Qwen/Qwen2.5-7B-Instruct base model. It leverages a substantial context length of 131072 tokens, enabling it to process and generate extensive text sequences.

Key Characteristics

  • Base Model: Fine-tuned from Qwen/Qwen2.5-7B-Instruct.
  • Parameter Count: 7.6 billion parameters.
  • Context Length: Supports a large context window of 131072 tokens.
  • Training Data: Fine-tuned on the mlfoundations-dev/seed_math_multiple_samples_scale_up_scaredy_cat_all dataset, indicating a potential focus on mathematical or complex reasoning tasks.

Training Details

The model was trained using the following key hyperparameters:

  • Learning Rate: 1e-05
  • Optimizer: AdamW with betas=(0.9, 0.999) and epsilon=1e-08.
  • Batch Size: A total training batch size of 96 (1 per device with 12 gradient accumulation steps across 8 GPUs).
  • Epochs: 3.0 epochs.
  • Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio.

Potential Use Cases

Given its fine-tuning dataset and large context window, this model is likely suitable for applications requiring:

  • Processing and understanding long documents or complex problem descriptions.
  • Tasks involving mathematical reasoning or data analysis, depending on the specifics of the seed_math_multiple_samples_scale_up_scaredy_cat_all dataset.
  • Applications where the ability to maintain coherence over extended text is crucial.