matthewchung74/Qwen2.5_3B-GRPO-medical-reasoning

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Feb 13, 2025Architecture:Transformer0.0K Warm

The matthewchung74/Qwen2.5_3B-GRPO-medical-reasoning model is a 3.1 billion parameter, transformer-based language model developed by Matthew Chung. Fine-tuned from Qwen2.5-3B-Instruct using Generalized Reinforcement Policy Optimization (GRPO), it is specifically optimized for medical reasoning tasks. This model excels at educational applications related to medical reasoning, incorporating custom reward functions for semantic correctness and perplexity.

Loading preview...

Model Overview

This model, developed by Matthew Chung, is a 3.1 billion parameter, transformer-based language model fine-tuned from Qwen/Qwen2.5-3B-Instruct. It leverages Generalized Reinforcement Policy Optimization (GRPO) to specialize in medical reasoning tasks, utilizing the Unsloth library for efficient training.

Key Capabilities

  • Medical Reasoning: Optimized for understanding and generating responses related to medical reasoning, trained on the FreedomIntelligence/medical-o1-reasoning-SFT dataset.
  • GRPO Fine-tuning: Incorporates custom reward functions for semantic correctness and perplexity, enhancing its medical domain performance.
  • Educational Use: Primarily intended for educational purposes in medical contexts.

Training Details

The model was trained on 25,117 samples from the FreedomIntelligence/medical-o1-reasoning-SFT dataset over approximately 14 hours on an NVIDIA RTX 3090. Training involved a learning rate of 5e-6, a batch size of 1, and 4-bit quantization, achieving a final loss of 0.001300 and a semantic score of 0.630995.

Limitations and Recommendations

  • Educational Use Only: Explicitly stated as not intended for direct medical advice, diagnosis, or treatment recommendations.
  • Potential for Inaccuracy: May generate incorrect or misleading medical information.
  • Human Oversight Required: Not suitable for high-stakes medical decision-making without professional human oversight.
  • Bias Awareness: Users should be aware of potential biases inherited from the training data and verify outputs with medical professionals.