brucethemoose/Yi-34B-200K-DARE-merge-v5

TEXT GENERATIONConcurrency Cost:2Model Size:34BQuant:FP8Ctx Length:32kPublished:Dec 16, 2023License:yi-licenseArchitecture:Transformer0.0K Cold

The brucethemoose/Yi-34B-200K-DARE-merge-v5 is a 34 billion parameter language model based on the Yi architecture, featuring an extended 200K context window. This model is a merge of several Yi-based finetunes, including Nous-Capybara, Tess-M, Airoboros, and PlatYi, utilizing an experimental 'dare ties' merging method. It is optimized for long-context tasks and general conversational applications, demonstrating strong performance across various benchmarks with an average score of 71.98 on the Open LLM Leaderboard.

Loading preview...

Overview

This model, brucethemoose/Yi-34B-200K-DARE-merge-v5, is a 34 billion parameter large language model built upon the Yi architecture, notable for its substantial 200,000 token context window. It represents an advanced merge of multiple high-performing Yi-based finetunes, including Nous-Capybara-34B, Tess-M-v1.4, Airoboros-3_1-yi-34b-200k, PlatYi-34B-200K-Q, Pallas-0.4, Yi-34B-200K-AEZAKMI-v2, and a small contribution from SUS-Chat-34B. The merge was performed using an experimental "dare ties" implementation via mergekit, a technique explored in the "Language Models are Super Mario" paper, aiming to absorb abilities from homologous models.

Key Capabilities & Features

  • Extended Context Window: Supports up to 200,000 tokens, making it suitable for processing and generating very long texts.
  • Merged Intelligence: Combines the strengths of several specialized Yi finetunes, potentially enhancing its general reasoning and conversational abilities.
  • Optimized for Yi: Recommendations for running include using a lower temperature (0.02-0.1 MinP) and a slight repetition penalty, as Yi models tend to run "hot."
  • Hardware Efficiency: Can run 45K-75K context on 24GB GPUs using exllamav2 and UIs like exui.
  • Benchmark Performance: Achieves an average score of 71.98 on the Open LLM Leaderboard, with notable scores in MMLU (77.22) and HellaSwag (85.54).

Usage Notes

  • The model uses an Orca-Vicuna prompt template (SYSTEM: {system_message}\nUSER: {prompt}\nASSISTANT:).
  • Users might need to add </s> as an additional stopping condition, as the model can sometimes spell out the stop token.
  • For full-context backends like transformers, max_position_embeddings in config.json must be lowered from 200,000 to avoid Out-Of-Memory errors.