brucethemoose/SUS-Bagel-200K-DARE-Test

TEXT GENERATIONConcurrency Cost:2Model Size:34BQuant:FP8Ctx Length:32kPublished:Jan 6, 2024License:yi-licenseArchitecture:Transformer0.0K Cold

brucethemoose/SUS-Bagel-200K-DARE-Test is a 34 billion parameter experimental language model based on the Yi 4K architecture, extended to a 200K context length. This model is a merge of SUS-Chat-34B, Yi-34B-200K-Llama, and two versions of jondurbin_bagel-34b, utilizing the DARE TIES merge method. It is designed to explore context extension for models like SUS and Bagel, which typically have shorter context windows.

Loading preview...

Overview

brucethemoose/SUS-Bagel-200K-DARE-Test is an experimental 34 billion parameter language model created by brucethemoose. Its primary goal is to investigate and extend the context window of existing models like SUS and Bagel, which are typically based on a 4K context Yi architecture. This model specifically aims to push the context length to 200K, addressing limitations where models like DPO Bagel tend to degrade beyond 4K context.

Merge Details

This model was constructed using the DARE TIES merge method, with /home/alpha/Models/Raw/chargoddard_Yi-34B-Llama serving as the base model. The merge incorporates several distinct models:

  • SUSTech_SUS-Chat-34B
  • chargoddard_Yi-34B-200K-Llama
  • jondurbin_bagel-34b-v0.2
  • jondurbin_bagel-dpo-34b-v0.2

The merging process involved specific weighting and density parameters for each component model, as detailed in the provided YAML configuration. Notably, the chargoddard_Yi-34B-200K-Llama model was merged with a density of 1, indicating its significant contribution to the context extension effort. The project also noted an attempt to include Hermes 34B, which was unsuccessful due to tokenizer compatibility issues with mergekit.

Intended Use

This model is primarily an experimental merge focused on evaluating the effectiveness of context extension techniques for specific base models. It is suitable for researchers and developers interested in model merging, context window expansion, and the behavior of merged models under extended context conditions.