Name: lastmass/Qwen3.5-Medical-GSPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: lastmass

Qwen3.5-Medical-GSPO: Chinese Medical Reasoning Model

This model, developed by lastmass, is a 4.5 billion parameter variant of Qwen3.5-4B, specifically fine-tuned for Chinese medical reasoning. It excels at generating structured chain-of-thought (CoT) explanations for complex medical queries, covering areas like clinical diagnosis, treatment planning, and differential diagnosis.

Key Capabilities & Training

The model's unique strength comes from its two-stage training pipeline:

Supervised Fine-Tuning (SFT): Initially trained on the FreedomIntelligence/medical-o1-reasoning-SFT dataset to establish a consistent output format: a <think>...</think> reasoning block followed by a concise final answer.
Group Sequence Policy Optimization (GSPO): This reinforcement learning stage uses an LLM-as-Judge (DeepSeek-Chat) reward function. Crucially, the judge evaluates only the final conclusion, not the CoT, preventing reward hacking and ensuring medically sound answers. GSPO, a sequence-level variant of GRPO, enhances training stability over long reasoning sequences.

Use Cases & Limitations

This model is particularly well-suited for applications requiring detailed medical explanations and diagnostic reasoning in Chinese. It performs better on reasoning-heavy questions than on pure factual recall. However, it's important to note that this is a LoRA adapter trained on a relatively small dataset (~20k examples) and is not validated for clinical use. Its performance may be limited on rare diseases or highly specialized subspecialties, and all outputs should be reviewed by qualified medical professionals.

Overview

Qwen3.5-Medical-GSPO: Chinese Medical Reasoning Model

Key Capabilities & Training

Use Cases & Limitations

Full Model Card (README)