sumo43/Yi-34b-x2

TEXT GENERATIONConcurrency Cost:2Model Size:34BQuant:FP8Ctx Length:32kPublished:Jan 15, 2024License:mitArchitecture:Transformer Open Weights Cold

sumo43/Yi-34b-x2 is a 34 billion parameter language model created by sumo43 through a merge of jondurbin/bagel-dpo-34b-v0.2 and one-man-army/UNA-34Beagles-32K-bf16-v1 using the SLERP method. This merged model leverages the strengths of its constituent models, offering a 32K context length. It is designed for general language tasks, benefiting from the combined capabilities of its base models.

Loading preview...

Overview

sumo43/Yi-34b-x2 is a 34 billion parameter language model resulting from a merge of two pre-trained models: jondurbin/bagel-dpo-34b-v0.2 and one-man-army/UNA-34Beagles-32K-bf16-v1. This merge was performed using the SLERP (Spherical Linear Interpolation) method, a technique often employed to combine the strengths of different models while maintaining performance.

Merge Details

  • Base Models: The merge combined jondurbin/bagel-dpo-34b-v0.2 and one-man-army/UNA-34Beagles-32K-bf16-v1.
  • Method: The SLERP merge method was utilized, with specific parameter weighting applied to different layers (self_attn and mlp) to fine-tune the combination.
  • Context Length: The resulting model maintains a notable context length of 32K tokens, inherited from its base models, which is beneficial for processing longer inputs and generating coherent, extended outputs.

Key Characteristics

This model is a product of combining existing high-performing models, aiming to synthesize their respective capabilities. While specific performance benchmarks are not detailed in the merge configuration, the intent of such a merge is typically to enhance overall performance, robustness, or specific task capabilities by leveraging diverse training data or fine-tuning objectives of the constituent models.

Potential Use Cases

Given its 34 billion parameters and 32K context window, sumo43/Yi-34b-x2 is suitable for a range of demanding natural language processing tasks, including:

  • Advanced text generation and completion
  • Complex reasoning and instruction following
  • Summarization of lengthy documents
  • Conversational AI requiring extended context