medalpaca/medalpaca-7b

Cold
Public
7B
FP8
4096
License: cc
Hugging Face
Overview

MedAlpaca-7B: A Medical Domain LLM

MedAlpaca-7B is a 7 billion parameter large language model built upon the LLaMA architecture, specifically fine-tuned to address medical domain tasks. Its primary objective is to enhance performance in medical question-answering and dialogue generation.

Key Capabilities & Training

This model was trained on a comprehensive and diverse dataset tailored for medical applications. Key data sources include:

  • ChatDoctor: A large dataset of 200,000 medical question-answer pairs.
  • Wikidoc: Generated medical Q&A pairs from relevant paragraphs and headings.
  • Anki Flashcards: Automatically generated questions and answers from medical flashcards.
  • StackExchange: Top-rated Q&A pairs from Academia, Bioinformatics, Biology, Fitness, and Health categories.

Performance & Limitations

Evaluations on the Open LLM Leaderboard show an average score of 44.98, with specific scores like 54.1 on ARC (25-shot) and 80.42 on HellaSwag (10-shot). MMLU (5-shot) scored 41.47. It's important to note that the model's efficacy is primarily within the medical domain and its knowledge level is geared towards medical students, not board-certified physicians. It has not been tested in real-world clinical applications and should be treated as a research tool only, not a substitute for professional medical advice.