Name: HuggingFaceTB/finemath-ablation-3plus-160B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: HuggingFaceTB

Model Overview

HuggingFaceTB/finemath-ablation-3plus-160B is a 3.21 billion parameter model built on the Llama3 architecture, developed by HuggingFaceTB as part of their FineMath ablation research. It underwent pretraining for 60,000 steps, consuming a total of 160 billion tokens. The training data composition included 40% FineWeb-Edu, 30% FineMath-3+, and 30% InfiWebMath-3+, all sourced from the FineMath dataset.

Key Characteristics

Architecture: Llama3-based, 3.21 billion parameters.
Training Data: Heavily weighted towards mathematical datasets (FineMath-3+, InfiWebMath-3+).
Context Length: 4096 tokens.
Precision: Trained using bfloat16.
Intermediate Checkpoints: Available at 10,000-step intervals (10B tokens) in separate branches for research and analysis.

Intended Use and Limitations

This model is primarily intended for text completion in English with a strong focus on mathematical content. It is not instruction-tuned, meaning it performs best in generative tasks rather than following explicit instructions. A key purpose of this model is to serve as a comparative tool within the FineMath research initiative, evaluating the impact of specific mathematical data mixes on model performance. Due to its specialized training, its performance may be limited in non-mathematical or multilingual contexts. Users should also be aware of potential biases or harmful content inherited from its training data.

Overview

Model Overview

Key Characteristics

Intended Use and Limitations

Full Model Card (README)