anujjamwal/OpenMath-Nemotron-1.5B-PruneAware

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Mar 5, 2026Architecture:Transformer Warm

OpenMath-Nemotron-1.5B-PruneAware by Anuj Jamwal is a 1.5 billion parameter language model fine-tuned for mathematical reasoning. It implements 'Cognitive Compression,' a novel approach that generates hierarchical, structured chains of thought which can be actively pruned during inference. This method allows for the dynamic replacement of solved subproblem reasoning with a summary and solution, significantly reducing context window pressure compared to traditional append-only Chain-of-Thought methods. The model is designed to maintain solution quality while improving efficiency in complex reasoning tasks.

Loading preview...

OpenMath-Nemotron-1.5B-PruneAware: Efficient Mathematical Reasoning

This model, developed by Anuj Jamwal, is a 1.5 billion parameter language model specifically fine-tuned for mathematical and complex reasoning tasks. It introduces a unique approach called Cognitive Compression to enhance inference efficiency while preserving solution quality.

Key Capabilities & Differentiators

  • Cognitive Compression: Unlike traditional Chain-of-Thought (CoT) methods that are append-only, this model generates hierarchical, structured chains of thought. This allows for the active pruning of reasoning steps for solved subproblems.
  • Context Window Optimization: Once a subproblem is solved, its full CoT can be discarded and replaced with a concise summary and solution. This dramatically reduces context window pressure, making reasoning more efficient.
  • Hierarchical Reasoning: The model breaks down complex problems into subproblems, enabling a more structured and manageable reasoning process.

Training Details

The model is a fine-tuned version of an existing Nemotron-1.5B model, trained using the TRL library with an SFT (Supervised Fine-Tuning) procedure. The development of Cognitive Compression is part of a project titled "Cognitive Compression: Hierarchical Chain of Thought for Efficient LLM Reasoning" from Stanford University.

When to Use This Model

This model is particularly well-suited for applications requiring efficient and structured reasoning, especially in mathematical or logical problem-solving where managing context length is crucial. Its ability to compress reasoning steps makes it valuable for scenarios where long, detailed chains of thought would otherwise exhaust the context window.