HuggingFaceTB/finemath-ablation-fwedu

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3.2BQuant:BF16Ctx Length:32kPublished:Dec 19, 2024License:apache-2.0Architecture:Transformer Open Weights Warm

HuggingFaceTB/finemath-ablation-fwedu is a 3.21 billion parameter Llama3-based causal language model, part of the FineMath ablation studies. It was continuously pretrained on 60 billion tokens from the FineWeb-Edu dataset with a 4096 context length. This model is specifically designed for English text completion with a strong focus on mathematical content, serving primarily as a comparative tool for evaluating math-focused pretraining strategies.

Loading preview...

Model Overview

This model, HuggingFaceTB/finemath-ablation-fwedu, is a 3.21 billion parameter Llama3-based causal language model. It is an experimental model developed as part of the FineMath ablation studies, focusing on the impact of different math datasets during pretraining. The model was continuously pretrained for 60 billion tokens on the FineWeb-Edu dataset using the llama3 tokenizer, with a context length of 4096 tokens.

Key Characteristics

  • Architecture: Llama3-based, 3.21 billion parameters.
  • Training: Pretrained on 60 billion tokens from FineWeb-Edu over 60,000 steps.
  • Purpose: Primarily intended for comparative analysis within the FineMath ablation studies to evaluate math-focused pretraining.
  • Language: English-only, with a strong emphasis on mathematical content.
  • Intermediate Checkpoints: Available at 10,000-step intervals (10B tokens) in separate branches for detailed analysis.

Intended Use

This model is not instruction-tuned and is designed for text completion in English, particularly for mathematical contexts. Its main utility lies in research and development, specifically for comparing its performance against other models trained under similar conditions within the FineMath project. It is important to note that while it excels in math-related text completion, its performance in other domains or languages may be limited due to its specialized training data.