occultml/CatMarcoro14-7B-slerp
CatMarcoro14-7B-slerp is a 7 billion parameter language model created by occultml, resulting from a slerp merge of cookinai/CatMacaroni-Slerp and EmbeddedLLM/Mistral-7B-Merge-14-v0.2. This model leverages the Mistral architecture and is designed for general language tasks, demonstrating a strong average performance of 73.25 on the Open LLM Leaderboard across various benchmarks. With a 4096-token context length, it offers balanced capabilities for reasoning, common sense, and language understanding.
Loading preview...
occultml/CatMarcoro14-7B-slerp: A Merged 7B Language Model
CatMarcoro14-7B-slerp is a 7 billion parameter model developed by occultml, created through a slerp (spherical linear interpolation) merge of two distinct base models: cookinai/CatMacaroni-Slerp and EmbeddedLLM/Mistral-7B-Merge-14-v0.2. This merging technique aims to combine the strengths of its constituent models, leveraging the Mistral architecture.
Key Capabilities & Performance
This model demonstrates solid performance across a range of benchmarks, as evaluated on the Hugging Face Open LLM Leaderboard. It achieves an average score of 73.25, indicating balanced capabilities in various areas:
- Reasoning: Scored 69.37 on the AI2 Reasoning Challenge (25-Shot) and 73.01 on GSM8k (5-shot).
- Common Sense: Achieved 86.92 on HellaSwag (10-Shot) and 81.69 on Winogrande (5-shot).
- Language Understanding: Registered 65.27 on MMLU (5-Shot) and 63.24 on TruthfulQA (0-shot).
Configuration and Usage
The model was constructed using a specific slerp merge method, with varying interpolation parameters (t) applied to different layers (self_attn, mlp) to fine-tune the blend of the source models. It supports a context length of 4096 tokens. Developers can easily integrate and utilize this model for text generation tasks using the Hugging Face transformers library, as demonstrated in the provided Python usage example.