Gille/StrangeMerges_45-7B-dare_ties
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Mar 25, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Gille/StrangeMerges_45-7B-dare_ties is a 7 billion parameter language model created by Gille, formed by merging four distinct models using the dare_ties method via LazyMergekit. This model integrates components from MetaMath-Cybertron-Starling, BetterSaul-7B-slerp, T3Q-Mistral-Orca-Math-DPO, and Mistral-7B-Merge-14-v0.2, aiming to combine their respective strengths. With a 4096-token context length, it is designed for general text generation tasks, leveraging its merged architecture for potentially enhanced performance across various domains.

Loading preview...

Overview

StrangeMerges_45-7B-dare_ties is a 7 billion parameter language model developed by Gille. It is a product of merging four different base models using the dare_ties method, facilitated by LazyMergekit. This merging approach combines the strengths of several specialized models to create a more versatile and capable language model.

Merged Components

This model is a composite of the following individual models, each contributing with specific weights and densities:

  • Q-bert/MetaMath-Cybertron-Starling: Contributes 30% weight.
  • ozayezerceli/BetterSaul-7B-slerp: Contributes 20% weight.
  • chihoonlee10/T3Q-Mistral-Orca-Math-DPO: Contributes 40% weight.
  • EmbeddedLLM/Mistral-7B-Merge-14-v0.2: Contributes 10% weight.

Configuration Details

The merge process utilized dare_ties as the merging method and was built upon Gille/StrangeMerges_44-7B-dare_ties as the base model. The model operates with bfloat16 data type, indicating a balance between performance and memory efficiency. Its 4096-token context length allows for processing moderately long inputs and generating coherent responses.

Usage

Developers can integrate StrangeMerges_45-7B-dare_ties into their applications using the Hugging Face transformers library. The provided Python code snippet demonstrates how to load the model and tokenizer, apply a chat template for conversational prompts, and generate text outputs.