HuggingFaceTB/finemath-ablation-infiwebmath-4plus
HuggingFaceTB/finemath-ablation-infiwebmath-4plus is a 3.21 billion parameter Llama3-based causal language model, continuously pretrained by HuggingFaceTB on 60 billion tokens from the InfiMM-WebMath-4+ subset of the FineMath dataset. This model is specifically optimized for English mathematical text completion and serves as an ablation study component to compare performance under specific training conditions.
Loading preview...
Model Overview
HuggingFaceTB/finemath-ablation-infiwebmath-4plus is a 3.21 billion parameter model based on the Llama3 architecture, developed by HuggingFaceTB. It was continuously pretrained for 60 billion tokens on the InfiMM-WebMath-4+ subset of the FineMath dataset, utilizing the llama3 tokenizer. The model has a context length of 4096 tokens and was trained using bfloat16 precision on 64 H100 GPUs.
Key Capabilities
- Mathematical Text Completion: Primarily intended for generating and completing English text with a strong focus on mathematical content.
- Ablation Study Component: Designed to compare its performance against other models trained under similar conditions within the FineMath ablation series.
- Intermediate Checkpoints: Provides access to intermediate checkpoints at 10B token intervals, allowing for analysis of training progression.
Intended Use Cases
This model is suitable for research and development focusing on mathematical reasoning and text generation. Its primary purpose is to serve as a benchmark and comparison point within the FineMath ablation studies. It is not instruction-tuned and is best used for text completion tasks in English math domains. Limitations include potential reduced performance in non-English languages and the presence of biases from its training data.