brucethemoose/Yi-34B-200K-DARE-merge-v5
The brucethemoose/Yi-34B-200K-DARE-merge-v5 is a 34 billion parameter language model based on the Yi architecture, featuring an extended 200K context window. This model is a merge of several Yi-based finetunes, including Nous-Capybara, Tess-M, Airoboros, and PlatYi, utilizing an experimental 'dare ties' merging method. It is optimized for long-context tasks and general conversational applications, demonstrating strong performance across various benchmarks with an average score of 71.98 on the Open LLM Leaderboard.
Loading preview...
Overview
This model, brucethemoose/Yi-34B-200K-DARE-merge-v5, is a 34 billion parameter large language model built upon the Yi architecture, notable for its substantial 200,000 token context window. It represents an advanced merge of multiple high-performing Yi-based finetunes, including Nous-Capybara-34B, Tess-M-v1.4, Airoboros-3_1-yi-34b-200k, PlatYi-34B-200K-Q, Pallas-0.4, Yi-34B-200K-AEZAKMI-v2, and a small contribution from SUS-Chat-34B. The merge was performed using an experimental "dare ties" implementation via mergekit, a technique explored in the "Language Models are Super Mario" paper, aiming to absorb abilities from homologous models.
Key Capabilities & Features
- Extended Context Window: Supports up to 200,000 tokens, making it suitable for processing and generating very long texts.
- Merged Intelligence: Combines the strengths of several specialized Yi finetunes, potentially enhancing its general reasoning and conversational abilities.
- Optimized for Yi: Recommendations for running include using a lower temperature (0.02-0.1 MinP) and a slight repetition penalty, as Yi models tend to run "hot."
- Hardware Efficiency: Can run 45K-75K context on 24GB GPUs using
exllamav2and UIs likeexui. - Benchmark Performance: Achieves an average score of 71.98 on the Open LLM Leaderboard, with notable scores in MMLU (77.22) and HellaSwag (85.54).
Usage Notes
- The model uses an Orca-Vicuna prompt template (
SYSTEM: {system_message}\nUSER: {prompt}\nASSISTANT:). - Users might need to add
</s>as an additional stopping condition, as the model can sometimes spell out the stop token. - For full-context backends like
transformers,max_position_embeddingsinconfig.jsonmust be lowered from 200,000 to avoid Out-Of-Memory errors.