Overview

Fleming-R1-7B is a 7.6 billion parameter medical reasoning model developed by UbiquantAI, based on the Qwen2.5-7B architecture. It is designed to perform step-by-step analysis of complex medical problems and provide reliable answers. The model employs a unique training paradigm involving a "chain-of-thought cold start" and two-stage reinforcement learning, which includes adaptive hard-negative mining to enhance its reasoning capabilities for challenging problems.

Key Capabilities

Specialized Medical Reasoning: Optimized for medical scenarios, capable of detailed step-by-step analysis.
State-of-the-Art Performance: Achieves leading results on multiple medical benchmarks among models of comparable size.
Enhanced Data Strategy: Combines public medical datasets with knowledge graphs to improve coverage of rare diseases, medications, and multi-hop reasoning chains.
Reinforcement Learning: Utilizes high-quality reasoning traces from teacher models and adaptive hard-negative mining to strengthen problem-solving.

Good For

Medical Research: Analyzing complex medical cases and generating reasoning traces for research purposes.
Non-Clinical Reference: Providing detailed information and step-by-step analysis for educational or informational use in medical contexts.
Benchmarking Medical LLMs: Evaluating and comparing the reasoning abilities of language models in healthcare domains, particularly on benchmarks like MedXpertQA.

Overview

Overview

Key Capabilities

Good For

Full Model Card (README)