mlxha/Qwen3-4B-grpo-medmcqa
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:May 6, 2025Architecture:Transformer0.0K Warm

The mlxha/Qwen3-4B-grpo-medmcqa model is a 4 billion parameter language model based on the Qwen/Qwen3-4B architecture, fine-tuned by mlxha. It was trained using the GRPO method on the medmcqa-grpo dataset, specializing it for medical multiple-choice question answering. This model leverages advanced reinforcement learning techniques to enhance its reasoning capabilities, particularly in specialized domains.

Loading preview...