mlfoundations-dev/seed_math_multiple_samples_scale_up_scaredy_cat_test
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Feb 11, 2025License:apache-2.0Architecture:Transformer Open Weights Warm

The mlfoundations-dev/seed_math_multiple_samples_scale_up_scaredy_cat_test model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct, developed by mlfoundations-dev. This model was trained with a learning rate of 1e-05 and a total batch size of 96 over 3 epochs, utilizing a cosine learning rate scheduler. Its primary differentiation and specific use cases are not detailed in the available information, but it is based on a 7 billion parameter architecture.

Loading preview...

Overview

The seed_math_multiple_samples_scale_up_scaredy_cat_test model is a fine-tuned variant of the Qwen/Qwen2.5-7B-Instruct base model. Developed by mlfoundations-dev, this model leverages the robust architecture of Qwen2.5-7B-Instruct, which is a 7 billion parameter instruction-tuned language model.

Training Details

The model was fine-tuned using the mlfoundations-dev/seed_math_multiple_samples_scale_up_scaredy_cat_test dataset. Key training hyperparameters include:

  • Learning Rate: 1e-05
  • Optimizer: ADAMW_TORCH with betas=(0.9, 0.999) and epsilon=1e-08
  • Batch Size: A total training batch size of 96 (1 per device across 8 GPUs with 12 gradient accumulation steps)
  • Epochs: 3.0
  • Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio

Key Capabilities

Specific capabilities and intended uses are not detailed in the current model information. However, as a fine-tuned version of Qwen2.5-7B-Instruct, it likely inherits and potentially specializes in areas such as:

  • Instruction following
  • General language understanding and generation

Good for

Without further information, the specific optimal use cases for this fine-tuned model are not explicitly defined. Users should refer to the base model's documentation for general capabilities and conduct further evaluation for specialized tasks.