Ujjwal-Tyagi/Baichuan-M2-32B

TEXT GENERATIONConcurrency Cost:2Model Size:32.8BQuant:FP8Ctx Length:32kPublished:Mar 30, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

Baichuan-M2-32B is a 32.8 billion parameter medical-enhanced reasoning model developed by Baichuan AI, built upon Qwen2.5-32B. It features an innovative Large Verifier System and domain-specific fine-tuning on real-world medical questions. This model excels in medical reasoning tasks, achieving leading performance among open-source models on HealthBench, while maintaining strong general capabilities and supporting efficient 4-bit quantization for deployment.

Loading preview...

Baichuan-M2-32B: A Leading Medical-Enhanced Reasoning Model

Baichuan-M2-32B, developed by Baichuan AI, is a 32.8 billion parameter model specifically designed for medical reasoning tasks. Built on Qwen2.5-32B, it incorporates a novel Large Verifier System that includes patient simulators and multi-dimensional verification mechanisms to enhance medical accuracy and interaction. The model utilizes medical domain adaptation enhancement through Mid-Training and a multi-stage reinforcement learning strategy to progressively improve medical knowledge, reasoning, and patient interaction capabilities.

Key Capabilities & Features

  • World's Leading Open-Source Medical Model: Achieves top performance on HealthBench, outperforming other open-source and many proprietary models, with medical capabilities approaching GPT-5.
  • Doctor-Thinking Alignment: Trained on real clinical cases and patient simulators, demonstrating clinical diagnostic thinking and robust patient interaction.
  • Efficient Deployment: Supports 4-bit quantization, enabling deployment on a single RTX4090, and offers 58.5% higher token throughput in MTP version for single-user scenarios.
  • Technical Innovations: Features a Large Verifier System with dynamic scoring, medical domain adaptation via Mid-Training, and multi-stage reinforcement learning.

Performance Highlights

Baichuan-M2-32B demonstrates superior performance on medical benchmarks like HealthBench, scoring 60.1, and also shows strong general capabilities on benchmarks such as AIME24 (83.4) and Arena-Hard-v2.0 (45.8).

Intended Use Cases

This model is suitable for medical education, health consultation, and clinical decision support, intended for research and reference under the guidance of medical professionals.