ductaiphan/NanoReason-3B
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Feb 28, 2026Architecture:Transformer0.0K Warm

NanoReason-3B by Phan Duc Tai is a 3.1 billion parameter small language model (SLM) fine-tuned for verifiable Chain-of-Thought (CoT) reasoning, specifically in mathematical tasks. It utilizes a novel Step-Aware LoRA technique to distill complex reasoning from larger models, explicitly breaking down the reasoning process into a 4-stage cognitive graph. This model is optimized for generating step-by-step, verifiable solutions, making it suitable for educational technology and deterministic reasoning applications.

Loading preview...

NanoReason-3B: Verifiable Chain-of-Thought Reasoning

NanoReason-3B is a 3.1 billion parameter small language model (SLM) developed by Phan Duc Tai, optimized for mathematical reasoning and verifiable Chain-of-Thought (CoT) generation. It was fine-tuned using a novel Step-Aware LoRA technique, which distills complex reasoning capabilities from larger teacher models into an efficient architecture.

Key Innovations (Step-Aware LoRA)

Unlike traditional knowledge distillation, NanoReason-3B employs:

  • Hierarchical Step Supervision: The reasoning process is explicitly structured into a 4-stage cognitive graph: UNDERSTANDPLANEXECUTEVERIFY.
  • Step-Aware Loss Function: A weighted loss mechanism dynamically penalizes errors based on the cognitive stage, enhancing reasoning accuracy.
  • Verification-Aware Architecture (Dual LoRA): A primary Reasoning LoRA handles logic, while a secondary Verification LoRA is conditionally activated during the VERIFY step for self-correction.

Evaluation & Metrics

Beyond standard accuracy, NanoReason-3B is evaluated using novel metrics:

  • RFS (Reasoning Faithfulness Score): Measures alignment with teacher model's logical dependencies.
  • SVR (Self-Verification Rate): Tracks the model's ability to detect and correct its own errors during the [VERIFY] stage.

Training Details

  • Hardware: Trained on 2x NVIDIA T4 GPUs using 4-bit QLoRA.
  • Dataset: 15,000 parsed CoT trajectories from GSM8K, MATH, and VNHSGE.

Limitations

The model is heavily optimized for mathematical and deterministic reasoning. Its performance on open-ended creative writing or coding tasks has not been explicitly optimized, and hallucinations may occur if the [PLAN] step formulates an impossible strategy.