brucethemoose/Yi-34B-200K-DARE-megamerge-v8

TEXT GENERATIONConcurrency Cost:2Model Size:34BQuant:FP8Ctx Length:32kPublished:Jan 14, 2024License:yi-licenseArchitecture:Transformer0.0K Cold

brucethemoose/Yi-34B-200K-DARE-megamerge-v8 is a 34 billion parameter language model based on the Yi architecture, specifically designed for exceptional long-context performance up to 200,000 tokens. This model is a DARE TIES merge of multiple Yi 34B 200K models, aiming to excel in 32K+ context scenarios without additional fine-tuning. It is optimized for handling extensive conversational and textual inputs, making it suitable for applications requiring deep contextual understanding.

Loading preview...

Overview

This model, brucethemoose/Yi-34B-200K-DARE-megamerge-v8, is a 34 billion parameter language model built upon the Yi architecture. It leverages the DARE TIES merge method to combine numerous Yi 34B 200K models, with a primary goal of achieving superior performance in long-context scenarios, specifically targeting 32K+ context lengths without further fine-tuning. The merge process involved biasing weight gradients towards Vicuna-format models in initial layers to emphasize the Orca-Vicuna prompt template.

Key Capabilities

  • Extended Context Handling: Designed to excel with context lengths exceeding 32,000 tokens, up to its native 200,000 token capacity.
  • Merged Intelligence: Combines the strengths of over a dozen different Yi 34B 200K models, including those focused on creative writing, instruction following, and general chat.
  • Optimized for Yi Architecture: Recommends specific inference parameters (low temperature, MinP, repetition penalty) to manage Yi's large vocabulary and prevent 'hot' outputs.
  • Efficient Deployment: Can run high context (40K-90K) on 24GB GPUs using backends like exllamav2, with aggressive quantization enabling use on 16GB GPUs.

Good For

  • Applications requiring deep understanding and generation over very long texts.
  • Scenarios where maintaining coherence and context across extensive conversations or documents is crucial.
  • Users seeking a robust 34B model with strong long-context capabilities without needing further fine-tuning.