HuggingFaceTB/FineMath-Llama-3B
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3.2BQuant:BF16Ctx Length:32kPublished:Jan 6, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

HuggingFaceTB/FineMath-Llama-3B is a 3 billion parameter Llama 3.2-based language model continually pre-trained by HuggingFaceTB on 160 billion tokens from FineMath and FineWeb-Edu datasets. This model demonstrates superior mathematical performance compared to its base model, while maintaining strong general reasoning and common sense capabilities. It is optimized for English text completion tasks requiring advanced mathematical understanding.

Loading preview...

Overview

HuggingFaceTB/FineMath-Llama-3B is a 3 billion parameter model built upon the Llama 3.2 architecture. It underwent continual pre-training by HuggingFaceTB using a substantial 160 billion token dataset, comprising 40% FineWeb-Edu and 60% FineMath (a high-quality math dataset). This specialized training regimen significantly enhances its mathematical proficiency.

Key Capabilities

  • Enhanced Mathematical Performance: Achieves superior results on math-related tasks compared to the base Llama 3.2 3B model.
  • Maintained General Intelligence: Preserves strong performance across knowledge, reasoning, and common sense benchmarks.
  • English Text Completion: Primarily intended for generating English text, particularly in contexts requiring mathematical understanding.

Training Details

The model was trained using nanotron on 64 H100 GPUs, leveraging datatrove for tokenization and lighteval for evaluation. It is part of a series of ablation models developed for the FineMath project.

Limitations

As it was predominantly trained on English math data, its performance in other languages may be limited. The model is not instruction-tuned and is intended for text completion rather than conversational or instruction-following tasks. Users should also be aware of potential biases or harmful content inherited from its training data.