Henrychur/DiagAgent-8B

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Aug 17, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

DiagAgent-8B by Henrychur is an 8 billion parameter large language model specifically optimized for interactive, multi-turn diagnostic reasoning in medical contexts. It is trained using reinforcement learning (GRPO) within the DiagGym virtual clinical environment, enabling it to recommend examinations, update diagnoses with new evidence, and determine when to finalize a diagnosis. This model excels at complex medical diagnostic workflows, outperforming many larger general-purpose and agentic LLMs in diagnostic accuracy and F1 scores in end-to-end evaluations.

Loading preview...

DiagAgent-8B: RL-Optimized Diagnostic Agent

DiagAgent-8B is an 8 billion parameter large language model developed by Henrychur, specifically designed for interactive, multi-turn medical diagnostic reasoning. Unlike traditional one-shot medical LLMs, DiagAgent-8B is optimized through reinforcement learning (GRPO) within the DiagGym virtual clinical environment. This training methodology allows it to safely learn complex diagnostic workflows, including recommending informative examinations, dynamically updating working diagnoses as new information becomes available, and deciding the optimal point to commit to a final diagnosis.

Key Capabilities

  • Interactive Diagnostic Reasoning: Engages in multi-turn interactions to gather information and refine diagnoses.
  • Examination Recommendation: Suggests the most informative examinations based on patient data and current diagnostic hypotheses.
  • Dynamic Diagnosis Updates: Adapts its working diagnosis as new evidence is presented.
  • Decision to Finalize Diagnosis: Determines when sufficient information has been collected to make a conclusive diagnosis.
  • Reinforcement Learning Optimization: Trained end-to-end using GRPO in a closed-loop virtual environment, ensuring robust and safe learning.

Performance Highlights

DiagAgent-8B demonstrates strong performance in medical diagnostic tasks, particularly in multi-turn scenarios. In end-to-end evaluations, it achieves an F1 score of 43.02 and an accuracy of 53.85, significantly outperforming many larger basic LLMs and other agentic systems like GPT-4o, Claude-4-sonnet, and Llama3.3 in these specific metrics. Its training focuses on optimizing for diagnosis accuracy, examination recommendation F1, and minimizing interaction turns.

Good for

  • Medical AI Assistants: Building AI systems that can simulate clinical diagnostic processes.
  • Clinical Decision Support: Assisting healthcare professionals with complex diagnostic pathways.
  • Medical Education & Training: Providing a safe, interactive environment for learning diagnostic reasoning.
  • Research in Agentic LLMs: Exploring the application of reinforcement learning for complex, multi-step reasoning tasks in specialized domains.