NanoReason-3B: Verifiable Chain-of-Thought Reasoning

NanoReason-3B is a 3.1 billion parameter small language model (SLM) developed by Phan Duc Tai, optimized for mathematical reasoning and verifiable Chain-of-Thought (CoT) generation. It was fine-tuned using a novel Step-Aware LoRA technique, which distills complex reasoning capabilities from larger teacher models into an efficient architecture.

Key Innovations (Step-Aware LoRA)

Unlike traditional knowledge distillation, NanoReason-3B employs:

Hierarchical Step Supervision: The reasoning process is explicitly structured into a 4-stage cognitive graph: UNDERSTAND → PLAN → EXECUTE → VERIFY.
Step-Aware Loss Function: A weighted loss mechanism dynamically penalizes errors based on the cognitive stage, enhancing reasoning accuracy.
Verification-Aware Architecture (Dual LoRA): A primary Reasoning LoRA handles logic, while a secondary Verification LoRA is conditionally activated during the VERIFY step for self-correction.

Evaluation & Metrics

Beyond standard accuracy, NanoReason-3B is evaluated using novel metrics:

RFS (Reasoning Faithfulness Score): Measures alignment with teacher model's logical dependencies.
SVR (Self-Verification Rate): Tracks the model's ability to detect and correct its own errors during the [VERIFY] stage.

Training Details

Hardware: Trained on 2x NVIDIA T4 GPUs using 4-bit QLoRA.
Dataset: 15,000 parsed CoT trajectories from GSM8K, MATH, and VNHSGE.

Limitations

The model is heavily optimized for mathematical and deterministic reasoning. Its performance on open-ended creative writing or coding tasks has not been explicitly optimized, and hallucinations may occur if the [PLAN] step formulates an impossible strategy.

Overview

NanoReason-3B: Verifiable Chain-of-Thought Reasoning

Key Innovations (Step-Aware LoRA)

Evaluation & Metrics

Training Details

Limitations

Full Model Card (README)