HuggingFaceTB/finemath-ablation-owm
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3.2BQuant:BF16Ctx Length:32kPublished:Dec 18, 2024License:apache-2.0Architecture:Transformer Open Weights Warm

HuggingFaceTB/finemath-ablation-owm is a 3.21 billion parameter Llama3-based language model, part of the FineMath ablation studies, continuously pre-trained on 60 billion tokens from the OpenWebMath dataset. This model is specifically designed for text completion with a strong focus on mathematical tasks in English. Its primary purpose is to serve as a comparative baseline for evaluating performance against other models trained under similar conditions, rather than being an instruction-tuned model for general use.

Loading preview...

Model Overview

HuggingFaceTB/finemath-ablation-owm is a 3.21 billion parameter model built on the Llama3 architecture. It is a component of the FineMath ablation series, where the base Llama-3.2-3B model undergoes continuous pre-training on specialized mathematical datasets. This particular iteration was trained for 60 billion tokens using the OpenWebMath dataset and the llama3 tokenizer, with a context length of 4096 tokens.

Key Characteristics

  • Architecture: Llama3-based, 3.21 billion parameters.
  • Training Data: Exclusively pre-trained on 60 billion tokens from the OpenWebMath dataset, focusing on English mathematical content.
  • Purpose: Primarily intended for text completion in English with a strong emphasis on math. It is not instruction-tuned.
  • Research Focus: Designed as an experimental model to compare performance within the FineMath ablation studies, rather than an optimized model for direct application.
  • Intermediate Checkpoints: Provides access to intermediate checkpoints at 10B token intervals for detailed analysis of training progression.

Intended Use Cases

  • Mathematical Text Completion: Excels at generating text related to mathematical concepts and problems.
  • Research and Ablation Studies: Ideal for researchers comparing the impact of different math-focused pre-training strategies.
  • Baseline for Math-centric LLMs: Can serve as a foundational model for further fine-tuning or analysis in mathematical domains.

Limitations

  • Language Specificity: Predominantly trained on English math data, limiting its performance in other languages.
  • Bias: As with all models, its behavior is influenced by potential biases and harmful content present in its training data.
  • Not Instruction-Tuned: Requires careful prompting for specific tasks as it is not designed for general instruction following.