lastmass/Qwen3_Medical_GRPO

Warm
Public
4B
BF16
32768
1
Jun 18, 2025
License: apache-2.0
Hugging Face
Overview

Model Overview

lastmass/Qwen3_Medical_GRPO is a 4 billion parameter Qwen3-based language model developed by lastmass, specifically fine-tuned for medical applications. It builds upon unsloth/Qwen3-4B-Base and incorporates advanced training methodologies to specialize in the healthcare domain.

Key Capabilities & Training

  • Medical Domain Specialization: The model underwent multi-stage Supervised Fine-Tuning (SFT) to establish foundational medical knowledge and conversational abilities.
  • Enhanced Reasoning with GRPO: Further optimization was achieved using the Group Relative Policy Optimization (GRPO) algorithm. This involved designing and utilizing various accuracy (ACC) reward functions during different GRPO training stages.
  • Improved Accuracy and Reliability: The GRPO training aims to significantly enhance the model's accuracy, logical reasoning, and overall reliability when answering complex medical questions.
  • Structured Problem Solving: Designed to understand intricate medical problems, provide detailed logical analysis, and deliver well-structured solutions.

Use Cases & Features

  • Clinical Reasoning Engine: The model features a "think mode" activated by the <start_working_out> token, allowing for detailed, step-by-step diagnostic analysis and clinical reasoning, as demonstrated in examples like Diabetic Ketoacidosis (DKA) and Bacterial Meningitis.
  • High-Performance Inference: Recommended for use with the vLLM framework for efficient inference, supporting parallel processing and optimized memory utilization.
  • Ollama Integration: A quantized Q4_K_M version is available for easy deployment via Ollama (ollama run lastmass/Qwen3_Medical_GRPO).

Important Considerations

This model is intended for academic research and technical communication. Its output should not replace professional medical advice or be used as a basis for clinical decisions due to potential errors or inaccuracies.