perlthoughts/Chupacabra-7B-v2
perlthoughts/Chupacabra-7B-v2 is a 7 billion parameter language model developed by Ray Hernandez, built by merging Mistral-based models using the Spherical Linear Interpolation (SLERP) method. This technique aims to create a blended model that smoothly interpolates characteristics from its parent models, preserving distinct features better than traditional weight averaging. It is optimized for general language tasks, leveraging advanced training methods like DPO and SFT, and features an 8192 token context length.
Loading preview...
Chupacabra-7B-v2: A SLERP-Merged Mistral Model
Chupacabra-7B-v2 is a 7 billion parameter language model developed by Ray Hernandez, distinguished by its use of the Spherical Linear Interpolation (SLERP) merging method. Unlike traditional weight averaging, SLERP ensures smoother transitions between model parameters and better preserves the distinct characteristics of its parent Mistral-based models. This nuanced blending approach aims to capture the essence of multiple high-performing models, leveraging advanced training methods such as Direct Preference Optimization (DPO) and Supervised Fine-Tuning (SFT).
Key Capabilities & Features
- SLERP Merging: Utilizes Spherical Linear Interpolation for superior model blending, maintaining feature integrity.
- Mistral-based Architecture: Built upon the robust Mistral model family.
- Advanced Training: Incorporates state-of-the-art training methods like DPO and SFT for enhanced performance.
- General Language Tasks: Designed for a broad range of natural language processing applications.
- 8192 Token Context: Supports an 8K context window for processing longer inputs.
Performance Highlights
Evaluations on the Open LLM Leaderboard show competitive performance for its size:
- Avg.: 67.04
- AI2 Reasoning Challenge (25-Shot): 65.19
- HellaSwag (10-Shot): 83.39
- MMLU (5-Shot): 63.60
- TruthfulQA (0-shot): 57.17
- Winogrande (5-shot): 78.14
- GSM8k (5-shot): 54.74
Prompt Template
The model uses a specific instruction format:
<|im_start|>system
{system}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistantGood for
- Developers seeking a 7B parameter model with a unique merging methodology.
- Applications requiring a balance of performance and efficiency for general language tasks.
- Experimentation with models built using advanced interpolation techniques.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.