Overview
II-Medical-8B: Advanced Medical Reasoning Model
II-Medical-8B, developed by Intelligent Internet, is an 8 billion parameter large language model built upon the Qwen/Qwen3-8B architecture. It is specifically designed to enhance AI-driven medical reasoning and question answering capabilities.
Key Capabilities & Training:
- Medical Reasoning: Engineered for advanced medical question answering, leveraging a comprehensive set of reasoning datasets.
- Training Methodology: Utilizes Supervised Fine-Tuning (SFT) on Qwen/Qwen3-8B, followed by DAPO optimization on a hard-reasoning dataset to boost performance.
- Extensive Dataset: Trained on over 555,000 samples, including public medical reasoning datasets, synthetic medical QA data generated with QwQ, and curated medical R1 traces.
- Data Curation: Employs a sophisticated data curation pipeline involving embedding generation, K-means clustering, domain classification, and rigorous decontamination to ensure data quality and relevance.
Performance & Evaluation:
- HealthBench Score: Achieved a 40% score on HealthBench, an open-source benchmark for healthcare LLMs, demonstrating performance comparable to OpenAI's o1 reasoning model and GPT-4.5.
- Benchmark Excellence: Shows strong results across ten medical QA benchmarks, including MedMCQA, MedQA, PubMedQA, MMLU-Pro, and GPQA, often outperforming other 8B-class medical models.
Usage Guidelines:
- Recommended Parameters: Use
temperature = 0.6andtop_p = 0.9for optimal sampling. - Reasoning Format: Users are advised to explicitly request step-by-step reasoning and format the final answer within
\boxed{}for best results.
Limitations:
- The model's dataset may contain inherent biases from source materials.
- Medical knowledge requires regular updates, which may not be reflected in the current training data.
- Not suitable for direct medical use or clinical decision-making.