HuggingFaceTB/finemath-ablation-4plus-160B
HuggingFaceTB/finemath-ablation-4plus-160B is a 3.21 billion parameter Llama 3.2-3B base model, fine-tuned by HuggingFaceTB for 60 billion tokens on a specialized math dataset mix including FineMath-4+ and InfiWebMath-4+. This model is specifically designed for text completion in English with a strong focus on mathematical content, making it suitable for research and comparative performance analysis in math-centric language tasks. It was trained on a total of 160 billion tokens, emphasizing its mathematical domain expertise.
Loading preview...
Model Overview
This model, developed by HuggingFaceTB, is an ablation study variant of the Llama 3.2-3B architecture, featuring 3.21 billion parameters. It underwent continued pretraining for 60 billion tokens, specifically targeting mathematical proficiency. The training dataset comprised a unique mix of 40% FineWeb-Edu, 30% FineMath-4+, and 30% InfiWebMath-4+, totaling 160 billion tokens.
Key Characteristics
- Architecture: Llama 3.2-3B base model.
- Parameter Count: 3.21 billion parameters.
- Training Data: Specialized English math datasets (FineMath-4+, InfiWebMath-4+) combined with FineWeb-Edu.
- Context Length: 4096 tokens.
- Training Tokens: 160 billion tokens.
- Intermediate Checkpoints: Available at 10B token intervals for detailed analysis.
Intended Use Cases
This model is primarily intended for text completion in English with a focus on mathematical content. It is not instruction-tuned, making it suitable for:
- Research and Ablation Studies: Comparing its performance against other models trained under similar conditions.
- Mathematical Text Generation: Generating or completing text within a mathematical domain.
Limitations
- Language Specificity: Predominantly trained on English math data, limiting performance in other languages.
- Bias Potential: As with all models, its behavior is influenced by biases and potential harmful content present in its training data.