OpenMath-Nemotron-1.5B-PruneAware: Efficient Mathematical Reasoning

This model, developed by Anuj Jamwal, is a 1.5 billion parameter language model specifically fine-tuned for mathematical and complex reasoning tasks. It introduces a unique approach called Cognitive Compression to enhance inference efficiency while preserving solution quality.

Key Capabilities & Differentiators

Cognitive Compression: Unlike traditional Chain-of-Thought (CoT) methods that are append-only, this model generates hierarchical, structured chains of thought. This allows for the active pruning of reasoning steps for solved subproblems.
Context Window Optimization: Once a subproblem is solved, its full CoT can be discarded and replaced with a concise summary and solution. This dramatically reduces context window pressure, making reasoning more efficient.
Hierarchical Reasoning: The model breaks down complex problems into subproblems, enabling a more structured and manageable reasoning process.

Training Details

The model is a fine-tuned version of an existing Nemotron-1.5B model, trained using the TRL library with an SFT (Supervised Fine-Tuning) procedure. The development of Cognitive Compression is part of a project titled "Cognitive Compression: Hierarchical Chain of Thought for Efficient LLM Reasoning" from Stanford University.

When to Use This Model

This model is particularly well-suited for applications requiring efficient and structured reasoning, especially in mathematical or logical problem-solving where managing context length is crucial. Its ability to compress reasoning steps makes it valuable for scenarios where long, detailed chains of thought would otherwise exhaust the context window.

Overview

OpenMath-Nemotron-1.5B-PruneAware: Efficient Mathematical Reasoning

Key Capabilities & Differentiators

Training Details

When to Use This Model

Full Model Card (README)