NanoReason-3B by Phan Duc Tai is a 3.1 billion parameter small language model (SLM) fine-tuned for verifiable Chain-of-Thought (CoT) reasoning, specifically in mathematical tasks. It utilizes a novel Step-Aware LoRA technique to distill complex reasoning from larger models, explicitly breaking down the reasoning process into a 4-stage cognitive graph. This model is optimized for generating step-by-step, verifiable solutions, making it suitable for educational technology and deterministic reasoning applications.
Loading preview...
NanoReason-3B: Verifiable Chain-of-Thought Reasoning
NanoReason-3B is a 3.1 billion parameter small language model (SLM) developed by Phan Duc Tai, optimized for mathematical reasoning and verifiable Chain-of-Thought (CoT) generation. It was fine-tuned using a novel Step-Aware LoRA technique, which distills complex reasoning capabilities from larger teacher models into an efficient architecture.
Key Innovations (Step-Aware LoRA)
Unlike traditional knowledge distillation, NanoReason-3B employs:
- Hierarchical Step Supervision: The reasoning process is explicitly structured into a 4-stage cognitive graph:
UNDERSTAND→PLAN→EXECUTE→VERIFY. - Step-Aware Loss Function: A weighted loss mechanism dynamically penalizes errors based on the cognitive stage, enhancing reasoning accuracy.
- Verification-Aware Architecture (Dual LoRA): A primary Reasoning LoRA handles logic, while a secondary Verification LoRA is conditionally activated during the
VERIFYstep for self-correction.
Evaluation & Metrics
Beyond standard accuracy, NanoReason-3B is evaluated using novel metrics:
- RFS (Reasoning Faithfulness Score): Measures alignment with teacher model's logical dependencies.
- SVR (Self-Verification Rate): Tracks the model's ability to detect and correct its own errors during the
[VERIFY]stage.
Training Details
- Hardware: Trained on 2x NVIDIA T4 GPUs using 4-bit QLoRA.
- Dataset: 15,000 parsed CoT trajectories from GSM8K, MATH, and VNHSGE.
Limitations
The model is heavily optimized for mathematical and deterministic reasoning. Its performance on open-ended creative writing or coding tasks has not been explicitly optimized, and hallucinations may occur if the [PLAN] step formulates an impossible strategy.