lastmass/Qwen3.5-Medical-GSPO
lastmass/Qwen3.5-Medical-GSPO is a 4.5 billion parameter Chinese medical reasoning model, fine-tuned from Qwen3.5-4B. It specializes in generating structured chain-of-thought (CoT) reasoning for medical questions, including clinical diagnosis and treatment planning. The model was developed using a two-stage pipeline involving Supervised Fine-Tuning (SFT) and Group Sequence Policy Optimization (GSPO) with an LLM-as-Judge reward function. This approach optimizes for medically reasonable final conclusions, making it suitable for applications requiring detailed medical explanations.
Loading preview...
Qwen3.5-Medical-GSPO: Chinese Medical Reasoning Model
This model, developed by lastmass, is a 4.5 billion parameter variant of Qwen3.5-4B, specifically fine-tuned for Chinese medical reasoning. It excels at generating structured chain-of-thought (CoT) explanations for complex medical queries, covering areas like clinical diagnosis, treatment planning, and differential diagnosis.
Key Capabilities & Training
The model's unique strength comes from its two-stage training pipeline:
- Supervised Fine-Tuning (SFT): Initially trained on the FreedomIntelligence/medical-o1-reasoning-SFT dataset to establish a consistent output format: a
<think>...</think>reasoning block followed by a concise final answer. - Group Sequence Policy Optimization (GSPO): This reinforcement learning stage uses an LLM-as-Judge (DeepSeek-Chat) reward function. Crucially, the judge evaluates only the final conclusion, not the CoT, preventing reward hacking and ensuring medically sound answers. GSPO, a sequence-level variant of GRPO, enhances training stability over long reasoning sequences.
Use Cases & Limitations
This model is particularly well-suited for applications requiring detailed medical explanations and diagnostic reasoning in Chinese. It performs better on reasoning-heavy questions than on pure factual recall. However, it's important to note that this is a LoRA adapter trained on a relatively small dataset (~20k examples) and is not validated for clinical use. Its performance may be limited on rare diseases or highly specialized subspecialties, and all outputs should be reviewed by qualified medical professionals.