BioMedLM-7B: A Specialized Biomedical Language Model

BioMedLM-7B, developed by PharMolix, is a 7 billion parameter generative language model specifically designed for the biomedical domain. It is fine-tuned from the Llama2-7B-Chat architecture, leveraging an extensive dataset of millions of biomedical papers from the S2ORC corpus. This specialized training enables BioMedLM-7B to achieve performance comparable to or superior to much larger general-purpose foundation models on several biomedical question-answering benchmarks.

Key Capabilities

Biomedical Expertise: Deeply specialized in understanding and generating content related to biomedicine.
Performance: Demonstrates strong performance on biomedical QA tasks, often outperforming larger general models.
Foundation: Built upon the robust Llama2-7B-Chat architecture.
Training Data: Fine-tuned on over 26 billion tokens pertinent to biomedicine, extracted from PubMed Central (PMC) and PubMed ID sources.

Good for

Biomedical Research: Assisting with information retrieval and question answering in biomedical contexts.
Academic Applications: Supporting research and development within the life sciences.
Integration into BioMedGPT-10B: Serves as the generative language model component of the larger BioMedGPT-10B multimodal system, which integrates natural language with diverse biomedical data modalities. More details can be found in the technical report.

Note: This model is intended for research and development purposes and should not be used for providing services to the general public due to its specialized nature and potential for misuse in sensitive domains.

Overview

BioMedLM-7B: A Specialized Biomedical Language Model

Key Capabilities

Good for

Full Model Card (README)