What is DATEXIS/DeepICD-R1-Llama-8B?
This model is an 8 billion parameter Llama 3.1-based clinical reasoning model, developed by DATEXIS, specifically designed for single-label ICD-10-CM diagnosis prediction from admission notes. It is a GRPO (Generalized Reinforcement Learning with Policy Optimization) post-trained variant, initialized from a Supervised Fine-Tuning (SFT) checkpoint, following the DeepICD-R1 training workflow.
Key Differentiators & Capabilities
- Specialized Clinical Reasoning: Optimized for medical NLP tasks, particularly ICD-10-CM diagnosis prediction.
- Advanced Training: Employs a unique SFT + GRPO approach, incorporating a custom reward system that includes:
- Format reward: Ensures structured output with reasoning traces and diagnosis tags.
- Hierarchical outcome reward: Provides partial credit based on ICD prefix overlap (chapter, category, full-code).
- LLM-as-a-judge reward: Uses an external LLM to score reasoning quality.
- Structured Output: Generates outputs in a
<think>...reasoning trace...</think><diagnosis>ICD_CODE</diagnosis> format, crucial for interpretability. - Research Prototype: Based on the DeepICD-R1 paper, which reported strong macro-F1 scores for Llama3.1-8B-Instruct (SFT + GRPO) on chapter-level (59.5), category-level (15.6), and full ICD-10 code (4.3) prediction.
Intended Use Cases
This model is a research prototype and is intended for:
- Clinical reasoning research from admission notes.
- ICD-10-CM diagnosis outcome prediction studies.
- Reinforcement learning applications in medical language models.
- Generating reasoning traces for structured prediction tasks.
Important: This model is not for real-world diagnosis, treatment decisions, or autonomous clinical coding. It requires expert oversight and validation for any practical application due to potential clinical inaccuracies and biases from the MIMIC-IV dataset.