Name: HuggingFaceTB/finemath-ablation-4plus-160B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: HuggingFaceTB

Model Overview

This model, developed by HuggingFaceTB, is an ablation study variant of the Llama 3.2-3B architecture, featuring 3.21 billion parameters. It underwent continued pretraining for 60 billion tokens, specifically targeting mathematical proficiency. The training dataset comprised a unique mix of 40% FineWeb-Edu, 30% FineMath-4+, and 30% InfiWebMath-4+, totaling 160 billion tokens.

Key Characteristics

Architecture: Llama 3.2-3B base model.
Parameter Count: 3.21 billion parameters.
Training Data: Specialized English math datasets (FineMath-4+, InfiWebMath-4+) combined with FineWeb-Edu.
Context Length: 4096 tokens.
Training Tokens: 160 billion tokens.
Intermediate Checkpoints: Available at 10B token intervals for detailed analysis.

Intended Use Cases

This model is primarily intended for text completion in English with a focus on mathematical content. It is not instruction-tuned, making it suitable for:

Research and Ablation Studies: Comparing its performance against other models trained under similar conditions.
Mathematical Text Generation: Generating or completing text within a mathematical domain.

Limitations

Language Specificity: Predominantly trained on English math data, limiting performance in other languages.
Bias Potential: As with all models, its behavior is influenced by biases and potential harmful content present in its training data.

Overview

Model Overview

Key Characteristics

Intended Use Cases

Limitations

Full Model Card (README)