Llama3-8B-merge-biomed-wizard Overview
This model, developed by chenjingshen, is an 8 billion parameter language model created through a DARE-TIES merge of three distinct Llama 3-based models: meta-llama/Meta-Llama-3-8B-Instruct, NousResearch/Hermes-2-Pro-Llama-3-8B, and aaditya/Llama3-OpenBioLLM-8B. The merge was implemented using MindNLP Wizard on a MindSpore/Ascend runtime stack, with an output dtype of bfloat16.
Key Capabilities and Performance
The model is designed to enhance performance in both general reasoning and specialized biomedical domains. It shows competitive or superior results compared to its base models across several benchmarks:
- General Reasoning: Achieves 70.81% accuracy on GSM8K and 76.01% on Winogrande, outperforming Llama3-8B-Instruct in these areas.
- Biomedical Expertise: Demonstrates strong performance on MMLU biomedical subsets, including 77.57% on MMLU-Professional Medicine and 82.00% on MMLU-Medical Genetics.
- Merge Method: Utilizes the DARE-TIES method, known for resolving interference when merging models, with
meta-llama/Meta-Llama-3-8B-Instruct as the base.
When to Use This Model
- Biomedical Applications: Ideal for tasks requiring strong understanding and generation in medical, biological, and clinical contexts.
- General-Purpose Reasoning: Suitable for applications needing robust logical and mathematical reasoning capabilities.
- Llama 3 Ecosystem: Integrates seamlessly with the Llama 3 prompt format, making it easy to adopt for existing Llama 3 users.