salmannyu/Qwen2.5-1.5B-Nemotron-Math-52B-Mid-Train-8

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Feb 8, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

salmannyu/Qwen2.5-1.5B-Nemotron-Math-52B-Mid-Train-8 is a 1.5 billion parameter causal language model, continued pre-trained from Qwen/Qwen2.5-1.5B. It was specifically mid-trained on 52 billion tokens from the NVIDIA Nemotron-CC-Math-v1 dataset, making it highly specialized for mathematical reasoning tasks. With a context length of 131072 tokens, this model is designed for applications requiring robust mathematical problem-solving capabilities.

Loading preview...

Overview

salmannyu/Qwen2.5-1.5B-Nemotron-Math-52B-Mid-Train-8 is a specialized 1.5 billion parameter language model, built upon the Qwen2.5-1.5B architecture. This model underwent continued pre-training specifically on the NVIDIA Nemotron-CC-Math-v1 dataset, utilizing approximately 52 billion tokens from its 4plus subset. This focused training regimen, reaching "Train 8" as a final checkpoint, aims to significantly enhance its capabilities in mathematical reasoning and problem-solving.

Key Capabilities

  • Enhanced Mathematical Reasoning: Optimized through extensive training on a dedicated math dataset.
  • Causal Language Modeling: Functions as a causal language model, suitable for text generation and completion.
  • Qwen2.5 Base: Leverages the robust architecture of the Qwen2.5 family.

Good for

  • Applications requiring strong mathematical problem-solving.
  • Research and development in AI for quantitative tasks.
  • Scenarios where a smaller, specialized model for math is preferred over larger, general-purpose LLMs.