m42-health/Llama3-Med42-70B

Hugging Face
TEXT GENERATIONConcurrency Cost:4Model Size:70BQuant:FP8Ctx Length:8kPublished:Jun 27, 2024License:llama3Architecture:Transformer0.1K Warm

Llama3-Med42-70B is a 70 billion parameter clinical large language model developed by M42 Health AI Team, fine-tuned from Llama3. This generative AI system is specifically designed to provide high-quality answers to medical questions, excelling in medical multiple-choice question answering (MCQA) tasks and achieving top performance on the Clinical Elo Rating Leaderboard. It processes text-only input with an 8192-token context length, making it suitable for medical question answering, patient record summarization, and aiding medical diagnosis.

Loading preview...

Med42-v2: A Clinically-Aligned Llama3-70B Model

Med42-v2-70B is a 70 billion parameter large language model developed by M42 Health AI Team, specifically instruction and preference-tuned for medical applications. Built upon the Llama3 architecture, this model aims to expand access to medical knowledge through high-quality generative AI.

Key Capabilities & Performance:

  • Superior Medical MCQA Performance: Outperforms GPT-4.0 in most multiple-choice question answering tasks.
  • State-of-the-Art MedQA: Achieves a 79.10 zero-shot performance on MedQA, surpassing other openly available medical LLMs.
  • Top Clinical Elo Rating: Ranks highest on the Clinical Elo Rating Leaderboard with a score of 1764, significantly outperforming Llama3-70B-Instruct and GPT-4o.
  • Instruction-Tuned: Fine-tuned on approximately 1 billion tokens from diverse open-access medical sources, including flashcards, exam questions, and dialogues.
  • 8K Context Length: Supports an 8192-token context window for processing medical text.

Intended Use Cases:

  • Medical question answering
  • Patient record summarization
  • Aiding medical diagnosis
  • General health Q&A

Important Limitations:

  • Not for Clinical Use: The model is not ready for real clinical use and requires extensive human evaluation and safety testing.
  • Potential for Harm: May generate incorrect or harmful information and carries a risk of perpetuating biases from training data.

For more details, refer to the research paper.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p