HuggingFaceTB/finemath-ablation-owm is a 3.21 billion parameter Llama3-based language model, part of the FineMath ablation studies, continuously pre-trained on 60 billion tokens from the OpenWebMath dataset. This model is specifically designed for text completion with a strong focus on mathematical tasks in English. Its primary purpose is to serve as a comparative baseline for evaluating performance against other models trained under similar conditions, rather than being an instruction-tuned model for general use.
Loading preview...
Model Overview
HuggingFaceTB/finemath-ablation-owm is a 3.21 billion parameter model built on the Llama3 architecture. It is a component of the FineMath ablation series, where the base Llama-3.2-3B model undergoes continuous pre-training on specialized mathematical datasets. This particular iteration was trained for 60 billion tokens using the OpenWebMath dataset and the llama3 tokenizer, with a context length of 4096 tokens.
Key Characteristics
- Architecture: Llama3-based, 3.21 billion parameters.
- Training Data: Exclusively pre-trained on 60 billion tokens from the OpenWebMath dataset, focusing on English mathematical content.
- Purpose: Primarily intended for text completion in English with a strong emphasis on math. It is not instruction-tuned.
- Research Focus: Designed as an experimental model to compare performance within the FineMath ablation studies, rather than an optimized model for direct application.
- Intermediate Checkpoints: Provides access to intermediate checkpoints at 10B token intervals for detailed analysis of training progression.
Intended Use Cases
- Mathematical Text Completion: Excels at generating text related to mathematical concepts and problems.
- Research and Ablation Studies: Ideal for researchers comparing the impact of different math-focused pre-training strategies.
- Baseline for Math-centric LLMs: Can serve as a foundational model for further fine-tuning or analysis in mathematical domains.
Limitations
- Language Specificity: Predominantly trained on English math data, limiting its performance in other languages.
- Bias: As with all models, its behavior is influenced by potential biases and harmful content present in its training data.
- Not Instruction-Tuned: Requires careful prompting for specific tasks as it is not designed for general instruction following.