lighteternal/Llama3-merge-biomed-8b

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kLicense:llama3Architecture:Transformer0.0K Warm

The lighteternal/Llama3-merge-biomed-8b is an 8 billion parameter language model based on the Llama 3 architecture, created through a DARE-TIES merge of Llama3-8b-Instruct, NousResearch/Hermes-2-Pro-Llama-3-8B, and aaditya/Llama3-OpenBioLLM-8B. This model is specifically optimized for biomedical tasks, demonstrating enhanced performance in areas like HendrycksTest for Biology and Medicine, while also showing improvements in complex reasoning benchmarks such as ARC Challenge and Winogrande. It is designed for applications requiring both general language understanding and specialized biomedical knowledge, with a context length of 8192 tokens.

Loading preview...

Overview

The lighteternal/Llama3-merge-biomed-8b is an 8 billion parameter language model resulting from a DARE-TIES merge of three distinct Llama 3-based models: Llama3-8b-Instruct, NousResearch/Hermes-2-Pro-Llama-3-8B, and aaditya/Llama3-OpenBioLLM-8B. This experimental merge aims to combine the strengths of general language understanding with specialized biomedical knowledge.

Key Capabilities & Performance

The model demonstrates promising performance, particularly in biomedical domains and complex reasoning tasks, as evidenced by its scores on the Hugging Face Open LLM Leaderboard:

  • Biomedical Expertise: Achieves significantly higher accuracy on various HendrycksTest tasks, including:
    • Anatomy: 72.59% (vs. 65.19% for Llama3-8B-Instruct)
    • Clinical Knowledge: 77.83% (vs. 74.72%)
    • College Biology: 81.94% (vs. 79.86%)
    • Medical Genetics: 86.00% (vs. 80.00%)
    • Professional Medicine: 77.94% (vs. 71.69%)
  • Enhanced Reasoning: Shows improvements in general reasoning benchmarks:
    • ARC Challenge: 59.39% Accuracy (vs. 57.17% for Llama3-8B-Instruct)
    • Winogrande: 75.93% Accuracy (vs. 74.51%)
  • General Understanding: Also performs well on Hellaswag with 62.59% Accuracy.

Merge Details

This model was created using the DARE TIES merge method, with meta-llama/Meta-Llama-3-8B-Instruct serving as the base model. The configuration involved specific density and weight parameters for each merged component to balance their contributions.

Recommended Usage

Users should follow the prompt template recommended for Llama 3 models to ensure optimal performance.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p