HuggingFaceTB/finemath-ablation-3plus-160B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3.2BQuant:BF16Ctx Length:32kPublished:Dec 19, 2024License:apache-2.0Architecture:Transformer Open Weights Warm

HuggingFaceTB/finemath-ablation-3plus-160B is a 3.21 billion parameter Llama3-based causal language model, part of the FineMath ablation studies. It was pretrained on 160 billion tokens, with a significant focus on mathematical datasets (FineMath-3+ and InfiWebMath-3+), alongside FineWeb-Edu. This model is specifically designed for text completion in English with an emphasis on mathematical reasoning and performance comparison within the FineMath research framework.

Loading preview...

Model Overview

HuggingFaceTB/finemath-ablation-3plus-160B is a 3.21 billion parameter model built on the Llama3 architecture, developed by HuggingFaceTB as part of their FineMath ablation research. It underwent pretraining for 60,000 steps, consuming a total of 160 billion tokens. The training data composition included 40% FineWeb-Edu, 30% FineMath-3+, and 30% InfiWebMath-3+, all sourced from the FineMath dataset.

Key Characteristics

  • Architecture: Llama3-based, 3.21 billion parameters.
  • Training Data: Heavily weighted towards mathematical datasets (FineMath-3+, InfiWebMath-3+).
  • Context Length: 4096 tokens.
  • Precision: Trained using bfloat16.
  • Intermediate Checkpoints: Available at 10,000-step intervals (10B tokens) in separate branches for research and analysis.

Intended Use and Limitations

This model is primarily intended for text completion in English with a strong focus on mathematical content. It is not instruction-tuned, meaning it performs best in generative tasks rather than following explicit instructions. A key purpose of this model is to serve as a comparative tool within the FineMath research initiative, evaluating the impact of specific mathematical data mixes on model performance. Due to its specialized training, its performance may be limited in non-mathematical or multilingual contexts. Users should also be aware of potential biases or harmful content inherited from its training data.