mlfoundations-dev/seed_math_multiple_samples_scale_up_scaredy_cat_baseline

Warm
Public
7.6B
FP8
131072
License: apache-2.0
Hugging Face
Overview

Model Overview

The mlfoundations-dev/seed_math_multiple_samples_scale_up_scaredy_cat_baseline is a 7.6 billion parameter language model, fine-tuned from the base model Qwen/Qwen2.5-7B-Instruct. It has been trained on the mlfoundations-dev/seed_math_multiple_samples_scale_up_scaredy_cat_baseline dataset, indicating a specialized focus, likely in mathematical problem-solving or related analytical tasks.

Key Characteristics

  • Base Model: Fine-tuned from Qwen/Qwen2.5-7B-Instruct.
  • Parameter Count: 7.6 billion parameters.
  • Context Length: Features a substantial context window of 131072 tokens, enabling it to process and understand very long inputs.
  • Training Details: Trained with a learning rate of 1e-05, a total batch size of 96, and for 3 epochs, utilizing a cosine learning rate scheduler.

Potential Use Cases

Given its fine-tuning on a specific dataset related to "seed_math_multiple_samples_scale_up_scaredy_cat_baseline," this model is likely best suited for:

  • Tasks involving mathematical reasoning and problem-solving.
  • Applications requiring deep contextual understanding over extended text, due to its large context window.
  • Research and development in areas where the specific training dataset's characteristics are relevant.