Technoculture/BioMistral-Hermes-Dare

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Feb 21, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

Technoculture/BioMistral-Hermes-Dare is a 7 billion parameter language model, created by Technoculture, formed by merging BioMistral/BioMistral-7B-DARE and NousResearch/Nous-Hermes-2-Mistral-7B-DPO. This model is designed for general language tasks with a focus on biomedical and conversational applications, leveraging its 4096-token context length. Its architecture combines specialized biomedical knowledge with strong instruction-following capabilities, making it suitable for diverse text generation and understanding. The model aims to provide robust performance across various benchmarks, including medical and general reasoning tasks.

Loading preview...

BioMistral-Hermes-Dare Overview

Technoculture/BioMistral-Hermes-Dare is a 7 billion parameter language model resulting from a merge of two distinct models: BioMistral/BioMistral-7B-DARE and NousResearch/Nous-Hermes-2-Mistral-7B-DPO. This linear merge combines the strengths of a biomedical-focused model with a DPO-tuned conversational model, aiming for a versatile and capable LLM.

Key Capabilities

  • Biomedical Understanding: Inherits specialized knowledge from BioMistral-7B-DARE, suggesting proficiency in medical question answering and related tasks.
  • Instruction Following: Benefits from the DPO fine-tuning of Nous-Hermes-2-Mistral-7B-DPO, enhancing its ability to follow complex instructions and engage in coherent dialogue.
  • General Reasoning: Evaluated across a range of benchmarks including MMLU, TruthfulQA, GSM8K, ARC, HellaSwag, and Winogrande, indicating broad general reasoning capabilities.

Potential Use Cases

  • Medical Q&A Systems: Ideal for applications requiring accurate responses to biomedical queries.
  • Conversational AI: Suitable for chatbots and virtual assistants that need to understand and generate human-like text.
  • Research and Development: Can be used as a base model for further fine-tuning on specific domain tasks, particularly in healthcare or scientific fields.