Model Overview
This model, HuggingFaceTB/finemath-ablation-fwedu, is a 3.21 billion parameter Llama3-based causal language model. It is an experimental model developed as part of the FineMath ablation studies, focusing on the impact of different math datasets during pretraining. The model was continuously pretrained for 60 billion tokens on the FineWeb-Edu dataset using the llama3 tokenizer, with a context length of 4096 tokens.
Key Characteristics
- Architecture: Llama3-based, 3.21 billion parameters.
- Training: Pretrained on 60 billion tokens from FineWeb-Edu over 60,000 steps.
- Purpose: Primarily intended for comparative analysis within the FineMath ablation studies to evaluate math-focused pretraining.
- Language: English-only, with a strong emphasis on mathematical content.
- Intermediate Checkpoints: Available at 10,000-step intervals (10B tokens) in separate branches for detailed analysis.
Intended Use
This model is not instruction-tuned and is designed for text completion in English, particularly for mathematical contexts. Its main utility lies in research and development, specifically for comparing its performance against other models trained under similar conditions within the FineMath project. It is important to note that while it excels in math-related text completion, its performance in other domains or languages may be limited due to its specialized training data.