Name: HuggingFaceTB/finemath-ablation-finemath-infimath-3plus API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: HuggingFaceTB

Model Overview

This model, developed by HuggingFaceTB, is a 3.21 billion parameter Llama3-based language model. It was continuously pre-trained on 60 billion tokens using a specialized dataset comprising 50% FineMath-3+ and 50% InfiWebMath-3+ data, both sourced from the FineMath dataset. The primary purpose of this model is to serve as an ablation study within the FineMath project, allowing for performance comparisons with other models trained under similar conditions.

Key Capabilities

Mathematical Text Completion: Optimized for generating English text with a strong focus on mathematical content.
Research & Ablation Studies: Intended for comparing performance against other models in controlled experimental setups.
Intermediate Checkpoints: Provides access to intermediate training checkpoints (e.g., 10B revision) for detailed analysis of training progression.

Training Details

The model was trained for 60,000 steps, processing 60 billion tokens in bfloat16 precision on 64 H100 GPUs. Training utilized nanotron, datatrove for tokenization, and lighteval for evaluation. Evaluation was conducted using the SmolLM2 setup, with details available on the SmolLM evaluation page.

Limitations

As the model was predominantly trained on English mathematical data, its performance in other languages or non-mathematical domains may be limited. Users should also be aware that the model's behavior and potential biases are influenced by its training data.

Overview

Model Overview

Key Capabilities

Training Details

Limitations

Full Model Card (README)