abacusai/Slerp-CM-mist-dpo

Warm
Public
7B
FP8
8192
License: apache-2.0
Hugging Face
Overview

Model Overview

abacusai/Slerp-CM-mist-dpo is a 7 billion parameter language model developed by abacusai, created through a Slerp Merge of two existing models: cookinai/CatMacaroni-Slerp and mncai/mistral-7b-dpo-v5. This merging strategy aimed to combine the strengths of both base models, particularly focusing on enhancing specific benchmark performances.

Key Capabilities & Performance

This model achieves an average score of 73.1 on the HuggingFace Leaderboard benchmarks. Notably, it shows an improvement in TruthfulQA and GSM8K scores compared to its constituent models, with scores of 62.82 and 72.78 respectively. Other benchmark results include ARC (69.62), HellaSwag (87.09), MMLU (64.81), and Winogrande (81.45). The merge successfully improved TruthfulQA over cookinai/CatMacaroni-Slerp and GSM8K over mncai/mistral-7b-dpo-v5.

Training Details

The model was created using a Slerp merge method, with specific parameter weighting applied to self-attention and MLP layers across the two source models. The base model for the merge was mncai/mistral-7b-dpo-v5.

Limitations

This model has not undergone safety evaluations and is intended solely for research and experimental purposes. Users should be aware of potential biases and risks inherent in large language models.