bigai-NPR/NPR-4B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kArchitecture:Transformer0.0K Warm

bigai-NPR/NPR-4B is a Native Parallel Reasoner (NPR) model developed by bigai-NPR, built on Qwen3-4B backbones. This model introduces a teacher-free framework for native parallel reasoning, enabling concurrent generation and evaluation of multiple reasoning branches through a three-stage, self-distilled training pipeline and a parallel-aware reinforcement learning algorithm (PAPO). It is specifically optimized for verifiable reasoning tasks such as symbolic math and programming, demonstrating performance gains up to 24.5% and inference speedups up to 4.6x over baselines.

Loading preview...

Native Parallel Reasoner (NPR-4B) Overview

NPR-4B is a novel language model framework developed by bigai-NPR, designed to perform native parallel reasoning. Unlike traditional sequential reasoning, NPR-4B can generate and evaluate multiple reasoning paths concurrently. This capability is achieved through a unique three-stage, self-distilled training pipeline and a specialized Parallel-Aware Policy Optimization (PAPO) reinforcement learning algorithm.

Key Capabilities & Features

  • Native Parallel Reasoning: Generates and evaluates multiple reasoning branches simultaneously, leading to significant inference speedups.
  • Teacher-Free Framework: Learns parallel reasoning without requiring an external teacher model.
  • Three-Stage Training Curriculum: Includes format-discovery (NPR-ZERO), supervised parallel warmup (NPR-BETA), and native-parallel RL (PAPO) for robust training.
  • PAPO Algorithm: A reinforcement learning objective tailored for stable optimization of parallel decoding policies.
  • NPR-Engine: An engineered backend addressing memory, determinism, and correctness issues in large-scale parallel rollouts.
  • Performance Gains: Achieves up to 24.5% performance gains on aggregate metrics and 4.6x inference speedups compared to autoregressive decoding on verifiable reasoning tasks.
  • Genuine Parallelism: Exhibits near 100% genuine parallel execution, minimizing hidden autoregressive fallbacks.

Good For

  • Research: Advancing LLM reasoning capabilities through parallel decoding and reinforcement learning.
  • Verifiable Reasoning Tasks: Excels in symbolic math, programming, and other domains where outputs can be verified and used as reward signals.
  • Candidate-Diverse Solutions: Ideal for building systems requiring multiple candidate solutions quickly, such as best-of-k verification pipelines.