UCSC-VLAA/MedVLThinker-7B-SFT_PMC

VISIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:32kPublished:Aug 2, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

MedVLThinker-7B-SFT_PMC is a 7 billion parameter medical vision-language model developed by UCSC-VLAA, built upon the Qwen2.5-VL-7B-Instruct base architecture. This model is specifically fine-tuned using supervised learning on the PMC-VQA dataset, optimizing its capabilities for medical image understanding and reasoning. It excels at interpreting medical images and generating relevant textual responses, making it suitable for specialized medical AI applications.

Loading preview...

MedVLThinker-7B-SFT_PMC: Medical Vision-Language Model

MedVLThinker-7B-SFT_PMC is a 7 billion parameter medical vision-language model developed by UCSC-VLAA. It is built on the Qwen2.5-VL-7B-Instruct base model and has been specifically fine-tuned using supervised learning on the PMC-VQA dataset.

Key Capabilities

  • Medical Image Understanding: Designed to interpret and reason about medical images.
  • Vision-Language Integration: Combines visual input from medical images with natural language processing for comprehensive analysis.
  • Specialized Training: Benefits from supervised fine-tuning on the PMC-VQA dataset, enhancing its performance in medical contexts.
  • Reasoning Capabilities: Includes a system prompt encouraging thought processes within <think> tags before providing a final answer in <answer> tags.

Good For

  • Medical AI Applications: Ideal for tasks requiring the analysis and interpretation of medical imagery.
  • Research in Medical Vision-Language: A strong baseline for further development and research in multimodal medical reasoning.
  • Question Answering on Medical Images: Capable of answering questions based on provided medical images.

This model is released under the Apache 2.0 license. More details and code can be found on the UCSC-VLAA/MedVLThinker GitHub page.