Azazelle/Sina-Thor-7b-Merge

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 11, 2024License:cc-by-4.0Architecture:Transformer Open Weights Cold

Azazelle/Sina-Thor-7b-Merge is a 7 billion parameter experimental language model based on the Mistral-7B-v0.1 architecture, created through a DARE (Dropout-Aware Rank-reduced Ensemble) merge. This model integrates components from rishiraj/smol-7b, SanjiWatsuki/openchat-3.5-1210-starling-slerp, and Azazelle/Dumb-Maidlet. It is designed for general language tasks, leveraging the combined strengths of its merged predecessors.

Loading preview...

Sina-Thor-7b-Merge: An Experimental DARE Merge

Sina-Thor-7b-Merge is a 7 billion parameter language model developed by Azazelle, built upon the foundational Mistral-7B-v0.1 architecture. This model is an experimental DARE (Dropout-Aware Rank-reduced Ensemble) merge, combining several distinct models to potentially enhance performance and capabilities.

Key Merge Components:

  • Base Model: mistralai/Mistral-7B-v0.1
  • Merged Models:
    • rishiraj/smol-7b (weight: 0.2, density: 0.41)
    • SanjiWatsuki/openchat-3.5-1210-starling-slerp (weight: 0.33, density: 0.54)
    • Azazelle/Dumb-Maidlet (weight: 0.53, density: 0.71)

Technical Details:

The merge process utilizes the dare_ties method, incorporating int8_mask for potential efficiency and specifying bfloat16 as the data type. This approach aims to leverage the strengths of each contributing model in a structured, experimental manner.

Good for:

  • Experimentation with DARE merges: Ideal for researchers and developers interested in exploring the effects and performance of DARE merging techniques.
  • General language generation: Suitable for a variety of text-based tasks, benefiting from the diverse origins of its merged components.
  • Building upon Mistral-7B: Offers a modified base for projects that typically use Mistral-7B, potentially providing different response characteristics.