HuggingFaceTB/finemath-ablation-finemath-infimath-3plus
HuggingFaceTB/finemath-ablation-finemath-infimath-3plus is a 3.21 billion parameter Llama3-based model, fine-tuned by HuggingFaceTB on 60 billion tokens from a 50/50 mix of FineMath-3+ and InfiWebMath-3+ datasets. This model is specifically designed for mathematical text completion in English, serving as an ablation study to compare performance under controlled training conditions. It features a 32768 token context length and is primarily intended for research into math-focused language model capabilities.
Loading preview...
Model Overview
This model, developed by HuggingFaceTB, is a 3.21 billion parameter Llama3-based language model. It was continuously pre-trained on 60 billion tokens using a specialized dataset comprising 50% FineMath-3+ and 50% InfiWebMath-3+ data, both sourced from the FineMath dataset. The primary purpose of this model is to serve as an ablation study within the FineMath project, allowing for performance comparisons with other models trained under similar conditions.
Key Capabilities
- Mathematical Text Completion: Optimized for generating English text with a strong focus on mathematical content.
- Research & Ablation Studies: Intended for comparing performance against other models in controlled experimental setups.
- Intermediate Checkpoints: Provides access to intermediate training checkpoints (e.g.,
10Brevision) for detailed analysis of training progression.
Training Details
The model was trained for 60,000 steps, processing 60 billion tokens in bfloat16 precision on 64 H100 GPUs. Training utilized nanotron, datatrove for tokenization, and lighteval for evaluation. Evaluation was conducted using the SmolLM2 setup, with details available on the SmolLM evaluation page.
Limitations
As the model was predominantly trained on English mathematical data, its performance in other languages or non-mathematical domains may be limited. Users should also be aware that the model's behavior and potential biases are influenced by its training data.