difanjiao/ThinkTwice-Qwen3-4B-Instruct

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Apr 3, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

The difanjiao/ThinkTwice-Qwen3-4B-Instruct is a 4 billion parameter instruction-tuned causal language model, fine-tuned from Qwen3-4B-Instruct using the ThinkTwice framework. This model is specifically optimized for complex reasoning tasks and self-refinement capabilities, allowing it to not only solve problems but also iteratively improve its own solutions. It demonstrates significant performance gains on mathematical reasoning benchmarks like AIME, making it suitable for applications requiring robust problem-solving and solution verification.

Loading preview...

ThinkTwice-Qwen3-4B-Instruct Overview

This model, developed by Difan Jiao and collaborators, is a 4 billion parameter instruction-tuned language model based on the Qwen3-4B-Instruct architecture. Its core innovation lies in the ThinkTwice framework, a two-phase GRPO-based training approach that jointly optimizes the model for both solving reasoning problems and refining its own solutions. This framework utilizes a binary correctness reward without requiring explicit correctness signals or critique annotations.

Key Capabilities & Differentiators

  • Joint Reasoning and Self-Refinement: Uniquely trained to not only generate initial solutions but also to critically evaluate and improve them in a subsequent step.
  • Implicit Rectify-then-Fortify Curriculum: The ThinkTwice framework fosters a natural progression where the model first corrects errors and then learns to preserve already-correct solutions.
  • Enhanced Mathematical Reasoning: Achieves notable performance improvements on benchmarks like AIME, outperforming standard GRPO-trained Qwen3-4B by +5 percentage points before refinement and +11.5 percentage points after one self-refinement step (pass@4).
  • Two-Pass Usage: Designed for a two-step inference process: first, solve the problem; second, refine the initial solution.

Ideal Use Cases

  • Complex Problem Solving: Excellent for tasks requiring logical deduction and multi-step reasoning, particularly in mathematical domains.
  • Automated Solution Verification: Suitable for applications where iterative improvement and self-correction of generated answers are beneficial.
  • Educational Tools: Can be leveraged in systems that help users understand and correct their reasoning processes.