Model Overview
HuggingFaceTB/finemath-ablation-3plus-160B is a 3.21 billion parameter model built on the Llama3 architecture, developed by HuggingFaceTB as part of their FineMath ablation research. It underwent pretraining for 60,000 steps, consuming a total of 160 billion tokens. The training data composition included 40% FineWeb-Edu, 30% FineMath-3+, and 30% InfiWebMath-3+, all sourced from the FineMath dataset.
Key Characteristics
- Architecture: Llama3-based, 3.21 billion parameters.
- Training Data: Heavily weighted towards mathematical datasets (FineMath-3+, InfiWebMath-3+).
- Context Length: 4096 tokens.
- Precision: Trained using bfloat16.
- Intermediate Checkpoints: Available at 10,000-step intervals (10B tokens) in separate branches for research and analysis.
Intended Use and Limitations
This model is primarily intended for text completion in English with a strong focus on mathematical content. It is not instruction-tuned, meaning it performs best in generative tasks rather than following explicit instructions. A key purpose of this model is to serve as a comparative tool within the FineMath research initiative, evaluating the impact of specific mathematical data mixes on model performance. Due to its specialized training, its performance may be limited in non-mathematical or multilingual contexts. Users should also be aware of potential biases or harmful content inherited from its training data.