Name: lastmass/Qwen3_Medical_GRPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: lastmass

Model Overview

lastmass/Qwen3_Medical_GRPO is a 4 billion parameter Qwen3-based language model developed by lastmass, specifically fine-tuned for medical applications. It builds upon unsloth/Qwen3-4B-Base and incorporates advanced training methodologies to specialize in the healthcare domain.

Key Capabilities & Training

Medical Domain Specialization: The model underwent multi-stage Supervised Fine-Tuning (SFT) to establish foundational medical knowledge and conversational abilities.
Enhanced Reasoning with GRPO: Further optimization was achieved using the Group Relative Policy Optimization (GRPO) algorithm. This involved designing and utilizing various accuracy (ACC) reward functions during different GRPO training stages.
Improved Accuracy and Reliability: The GRPO training aims to significantly enhance the model's accuracy, logical reasoning, and overall reliability when answering complex medical questions.
Structured Problem Solving: Designed to understand intricate medical problems, provide detailed logical analysis, and deliver well-structured solutions.

Use Cases & Features

Clinical Reasoning Engine: The model features a "think mode" activated by the <start_working_out> token, allowing for detailed, step-by-step diagnostic analysis and clinical reasoning, as demonstrated in examples like Diabetic Ketoacidosis (DKA) and Bacterial Meningitis.
High-Performance Inference: Recommended for use with the vLLM framework for efficient inference, supporting parallel processing and optimized memory utilization.
Ollama Integration: A quantized Q4_K_M version is available for easy deployment via Ollama (ollama run lastmass/Qwen3_Medical_GRPO).

Important Considerations

This model is intended for academic research and technical communication. Its output should not replace professional medical advice or be used as a basis for clinical decisions due to potential errors or inaccuracies.