Overview
DATEXIS/DeepICD-R1-zero-32B is a 32.8 billion parameter clinical reasoning model developed by DATEXIS. It is specifically designed for ICD-10-CM diagnosis outcome prediction from clinical admission notes, operating within the DeepICD-R1 framework. This model is notable for its "R1-Zero" training paradigm, where it was trained primarily through reinforcement learning (GRPO-style) without initial supervised fine-tuning. This approach encourages the model to autonomously discover and develop reasoning strategies, such as chain-of-thought and self-verification, directly from reward signals.
Key Capabilities
- ICD-10-CM Diagnosis Prediction: Predicts single ICD-10-CM codes from clinical text.
- Reasoning Trace Generation: Produces structured outputs that include a reasoning trace (
<think> tags) explaining how the diagnosis was derived, followed by the predicted ICD-10-CM code (<diagnosis> tags). - Reinforcement Learning for Reasoning: Demonstrates how RL alone can induce structured reasoning behaviors in large language models.
- Clinical NLP Research: Intended for research in clinical reasoning experiments and structured prediction from clinical notes.
Intended Use Cases
This model is strictly for research purposes, including:
- Clinical reasoning experiments.
- ICD-10-CM code prediction research.
- Reinforcement learning for language models.
- Reasoning trace generation and structured prediction from clinical notes.
Limitations and Ethical Considerations
- Trained primarily on English clinical notes from hospital-specific populations, leading to potential dataset biases and limitations with rare diagnoses.
- Reasoning traces, while convincing, may be incorrect, and predictions can fail for rare or long-tail diagnoses.
- Must not be used for medical diagnosis, clinical decision-making, or automated medical coding without expert supervision.
- Potential risks include propagation of dataset biases and overconfidence in generated reasoning. Expert oversight and fairness audits are crucial for responsible use.