brucethemoose/CaPlatTessDolXaBoros-Yi-34B-200K-DARE-Ties-HighDensity
brucethemoose/CaPlatTessDolXaBoros-Yi-34B-200K-DARE-Ties-HighDensity is a 34 billion parameter language model created by brucethemoose, built upon the Yi-34B-200K base. This model is a high-density DARE Ties merge of six distinct fine-tuned models, including Dolphin-2.2, Nous-Capybara, Tess-M, Airoboros, PlatYi, and Una-xaberius. It features an extended context length of 200,000 tokens and is optimized for enhanced perplexity and performance across various benchmarks, achieving an average score of 72.15 on the Open LLM Leaderboard.
Loading preview...
Model Overview
This model, CaPlatTessDolXaBoros-Yi-34B-200K-DARE-Ties-HighDensity, is a 34 billion parameter language model developed by brucethemoose. It is a unique high-density merge of six different fine-tuned models: Dolphin-2.2-yi-34b-200k, Nous-Capybara-34B, Tess-M-v1.4, Airoboros-3_1-yi-34b-200k, PlatYi-34B-200K-Q, and Una-xaberius-34b-v1beta. The merge utilizes an experimental "dare ties" implementation via mergekit, aiming to absorb abilities from homologous models. It maintains the Yi-34B-200K base's impressive 200,000 token context window.
Key Characteristics
- DARE Ties Merge: Employs a novel, high-density DARE Ties merging strategy, which has shown improved perplexity and leaderboard performance compared to regular ties, task arithmetic, or slerp merges.
- Extended Context: Inherits the 200,000 token context length from the Yi-34B-200K base, making it suitable for long-form content generation and analysis.
- Benchmark Performance: Achieves an average score of 72.15 on the Open LLM Leaderboard, with notable scores including 77.44 on MMLU and 85.77 on HellaSwag.
- Prompt Template Flexibility: Recognizes Orca-Vicuna, ChatML, and Llama-chat prompt formats due to its merged components.
Usage Recommendations
- Yi-Specific Settings: For optimal performance, it's recommended to disable the BOS token and use lower temperatures (0.05-0.13 MinP) with a slight repetition penalty, as Yi models tend to run "hot."
- Hardware: Can run 45K-75K context on 24GB GPUs using
exllamav2. - Quantization:
exl2quantizations profiled on task-similar data are highly recommended, especially at low bits-per-weight (bpw). - Full-Context Backends: For
transformersandvllm,max_position_embeddingsinconfig.jsonmust be reduced from 200,000 to avoid Out-of-Memory errors.