brucethemoose/Yi-34B-200K-DARE-merge-v7
The brucethemoose/Yi-34B-200K-DARE-merge-v7 is a 34 billion parameter language model based on the Yi architecture, specifically designed to excel in long-context performance up to 200,000 tokens. This model is a merge of several Yi 34B 200K models, utilizing the DARE Ties method to optimize for 32K+ context capabilities. It achieves an average score of 73.12 on the Open LLM Leaderboard, demonstrating strong reasoning and language understanding across various benchmarks.
Loading preview...
Overview
This model, brucethemoose/Yi-34B-200K-DARE-merge-v7, is a 34 billion parameter language model built upon the Yi architecture, specifically engineered for exceptional long-context performance. It leverages the DARE Ties merging method to combine multiple Yi 34B 200K models, with a primary goal of optimizing for contexts exceeding 32,000 tokens, up to its native 200,000 token capacity.
Key Capabilities
- Extended Context Handling: Designed to excel with context lengths of 32K+ tokens, making it suitable for tasks requiring extensive memory or document processing.
- Merged Architecture: Created by merging several Yi 34B 200K models, with weight gradients biased towards Vicuna-format models in initial layers to emphasize the Orca-Vicuna prompt template.
- Optimized Prompting: Recommends the Orca-Vicuna prompt template, with suggestions for low temperature and MinP settings for optimal generation.
Performance & Usage
On the Open LLM Leaderboard, the model achieves an average score of 73.12, with notable scores including 77.30 on MMLU (5-Shot) and 85.99 on HellaSwag (10-Shot). For efficient high-context inference, it is recommended to use context-efficient backends like exllamav2, as running in full-context backends like transformers requires adjusting max_position_embeddings to avoid Out-of-Memory errors. The model is particularly suited for scenarios demanding deep contextual understanding over very long inputs.