arcee-ai/BioMistral-merged-zephyr

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Feb 27, 2024Architecture:Transformer0.0K Cold

arcee-ai/BioMistral-merged-zephyr is a 7 billion parameter language model, merged using the TIES method with mistralai/Mistral-7B-v0.1 as its base. This model integrates capabilities from BioMistral/BioMistral-7B and HuggingFaceH4/zephyr-7b-beta, resulting in a model optimized for biomedical question answering and general conversational tasks. It features a 4096-token context length and demonstrates strong performance on medical benchmarks like Pubmedqa and Medmcqa, alongside general reasoning tasks.

Loading preview...

Overview

arcee-ai/BioMistral-merged-zephyr is a 7 billion parameter language model created by arcee-ai through a merge of existing pre-trained models. It utilizes the TIES merge method with mistralai/Mistral-7B-v0.1 as its foundational base. The merge specifically combines the strengths of BioMistral/BioMistral-7B, which is known for its biomedical domain expertise, and HuggingFaceH4/zephyr-7b-beta, a model recognized for its conversational and instruction-following capabilities.

Key Capabilities

  • Biomedical Question Answering: Achieves 76.8 on Pubmedqa and 47.21 on Medmcqa, indicating proficiency in medical domain understanding.
  • General Reasoning: Demonstrates performance on general benchmarks such as 55.89 on arc_challenge and 63.43 on Hellaswag.
  • Instruction Following: Inherits instruction-following abilities from the Zephyr component.

Merge Details

The model was constructed using mergekit, with specific density and weight parameters applied to the merged components. The configuration used float16 dtype and included an int8_mask for optimization.