mlfoundations-dev/seed_math_nvidia_math

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kLicense:llama3.1Architecture:Transformer Warm

The mlfoundations-dev/seed_math_nvidia_math model is an 8 billion parameter language model, fine-tuned from Meta-Llama-3.1-8B. This model is specifically optimized for mathematical tasks, leveraging the mlfoundations-dev/seed_math_nvidia_math dataset for its training. It is designed to excel in mathematical reasoning and problem-solving, making it suitable for applications requiring strong numerical and logical capabilities. The model has a context length of 32768 tokens, supporting extensive mathematical problem contexts.

Loading preview...

Overview

The mlfoundations-dev/seed_math_nvidia_math model is an 8 billion parameter language model, derived from a fine-tuning of the meta-llama/Meta-Llama-3.1-8B base model. Its training specifically utilized the mlfoundations-dev/seed_math_nvidia_math dataset, indicating a specialization in mathematical domains. The model achieved a final validation loss of 0.4341 during its training process.

Key Characteristics

  • Base Model: Fine-tuned from Meta-Llama-3.1-8B.
  • Parameter Count: 8 billion parameters.
  • Context Length: Supports a context window of 32768 tokens.
  • Specialization: Optimized for mathematical tasks and reasoning through targeted dataset training.

Training Details

The model was trained with a learning rate of 5e-06, a total batch size of 512 (across 8 devices with 8 gradient accumulation steps), and for 3 epochs. The optimizer used was AdamW with standard betas and epsilon, and a constant learning rate scheduler. The training utilized Transformers 4.46.1, Pytorch 2.3.0, Datasets 3.1.0, and Tokenizers 0.20.3.