CorticalStack/pastiche-crown-clown-7b-dare

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Feb 19, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

CorticalStack/pastiche-crown-clown-7b-dare is a 7 billion parameter language model created by CorticalStack, developed using a DARE (DARE-TIES) merge of four distinct 7B models: bardsai/jaskier-7b-dpo-v5.6, mlabonne/AlphaMonarch-7B, mlabonne/NeuralMonarch-7B, and macadeliccc/MBX-7B-v3-DPO. This model leverages the DARE merging technique to absorb abilities from its constituent models, aiming for enhanced performance across various tasks. It is designed to combine the strengths of its base models, offering a versatile solution for general language generation and understanding tasks.

Loading preview...

Model Overview

CorticalStack/pastiche-crown-clown-7b-dare is a 7 billion parameter language model created by CorticalStack, built upon the DARE (DARE-TIES) merging method. This model is a composite of four distinct 7B models:

  • bardsai/jaskier-7b-dpo-v5.6
  • mlabonne/AlphaMonarch-7B
  • mlabonne/NeuralMonarch-7B
  • macadeliccc/MBX-7B-v3-DPO

Key Capabilities

This model utilizes the "DARE-TIES" merge method, as described in the paper "Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch." This technique aims to combine the strengths and abilities of its constituent models, potentially leading to a more robust and versatile language model. The merge configuration specifies bardsai/jaskier-7b-dpo-v5.6 as the base model, with specific density and weight parameters applied to the other three models during the merge process. The model is configured to use bfloat16 for its dtype and includes an int8_mask parameter.

When to Use This Model

Given its construction from multiple fine-tuned 7B models, pastiche-crown-clown-7b-dare is suitable for general-purpose language generation and understanding tasks where a blend of capabilities from its diverse base models is beneficial. It is particularly relevant for users interested in exploring models created via advanced merging techniques like DARE-TIES, which aim to consolidate and enhance the performance characteristics of several specialized models into a single, more capable entity.