HuggingFaceTB/finemath-ablation-4plus-160B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3.2BQuant:BF16Ctx Length:32kPublished:Dec 19, 2024License:apache-2.0Architecture:Transformer Open Weights Warm

HuggingFaceTB/finemath-ablation-4plus-160B is a 3.21 billion parameter Llama 3.2-3B base model, fine-tuned by HuggingFaceTB for 60 billion tokens on a specialized math dataset mix including FineMath-4+ and InfiWebMath-4+. This model is specifically designed for text completion in English with a strong focus on mathematical content, making it suitable for research and comparative performance analysis in math-centric language tasks. It was trained on a total of 160 billion tokens, emphasizing its mathematical domain expertise.

Loading preview...

Model Overview

This model, developed by HuggingFaceTB, is an ablation study variant of the Llama 3.2-3B architecture, featuring 3.21 billion parameters. It underwent continued pretraining for 60 billion tokens, specifically targeting mathematical proficiency. The training dataset comprised a unique mix of 40% FineWeb-Edu, 30% FineMath-4+, and 30% InfiWebMath-4+, totaling 160 billion tokens.

Key Characteristics

  • Architecture: Llama 3.2-3B base model.
  • Parameter Count: 3.21 billion parameters.
  • Training Data: Specialized English math datasets (FineMath-4+, InfiWebMath-4+) combined with FineWeb-Edu.
  • Context Length: 4096 tokens.
  • Training Tokens: 160 billion tokens.
  • Intermediate Checkpoints: Available at 10B token intervals for detailed analysis.

Intended Use Cases

This model is primarily intended for text completion in English with a focus on mathematical content. It is not instruction-tuned, making it suitable for:

  • Research and Ablation Studies: Comparing its performance against other models trained under similar conditions.
  • Mathematical Text Generation: Generating or completing text within a mathematical domain.

Limitations

  • Language Specificity: Predominantly trained on English math data, limiting performance in other languages.
  • Bias Potential: As with all models, its behavior is influenced by biases and potential harmful content present in its training data.