brucethemoose/CapyTessBorosYi-34B-200K-DARE-Ties

TEXT GENERATIONConcurrency Cost:2Model Size:34BQuant:FP8Ctx Length:32kPublished:Nov 28, 2023License:yi-licenseArchitecture:Transformer0.0K Cold

brucethemoose/CapyTessBorosYi-34B-200K-DARE-Ties is a 34 billion parameter language model based on the Yi architecture, featuring a 32768 token context length. This model is a merge of Nous-Capybara-34B, Tess-M-v1.3, and airoboros-3_1-yi-34b-200k, utilizing an experimental 'dare ties' merging method. It is optimized for improved perplexity and high-context performance compared to standard TIES merges, making it suitable for complex conversational and long-form text generation tasks.

Loading preview...

Model Overview

This model, brucethemoose/CapyTessBorosYi-34B-200K-DARE-Ties, is a 34 billion parameter language model built upon the Yi architecture, supporting a substantial 32768 token context length. It is a composite model, created by merging three distinct finetunes: Nous-Capybara-34B, migtissera/Tess-M-v1.3, and bhenrym14/airoboros-3_1-yi-34b-200k. The merging process employed an experimental "dare ties" method, which is noted for achieving better perplexity and high-context results compared to traditional TIES merges.

Key Capabilities & Features

  • Advanced Merging Technique: Utilizes an experimental "dare ties" implementation, as detailed in the paper "Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch" (GitHub, MergeKit Dare branch).
  • High Context Window: Inherits the 200K (32768 token) context capability from its base Yi model, making it suitable for processing and generating very long texts.
  • Optimized Performance: Demonstrates improved perplexity and high-context performance over previous merge configurations.
  • Orca-Vicuna Prompt Format: Designed to work with the Orca-Vicuna prompt template for instruction following.

Usage Considerations

  • Yi-Specific Behavior: Users may need to disable the BOS token or use lower temperatures with MinP to manage Yi's tendency to run "hot."
  • Stop Token Handling: The model might spell out </s> as a stop token, requiring it to be added as an explicit stopping condition.
  • Hardware Recommendations: Can run 34B models at 45K-75K context on 24GB GPUs using exllamav2, with specific exl2 quantizations available for story writing tasks.