martyn/llama2-megamerge-dare-13b-v2

TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kPublished:Dec 17, 2023License:llama2Architecture:Transformer Open Weights Cold

The martyn/llama2-megamerge-dare-13b-v2 is a 13 billion parameter language model based on the Llama-2 architecture, created by martyn. This model is a DARE merge of 17 different Llama-2 13B models, including those focused on code, mathematics, and instruction following, resulting in a model that generalizes instruct styles. With a 4096-token context length, it is designed for diverse conversational and task-oriented applications.

Loading preview...

Model Overview

The martyn/llama2-megamerge-dare-13b-v2 is a 13 billion parameter language model built upon the Llama-2 architecture. Developed by martyn, this model is a "mega merge" created using the DARE (DARE: Dropout-based Adaptive Reweighting) merging technique, combining 17 distinct Llama-2 13B models. The merge process utilized specific p (0.11) and lambda (2.1) parameters, which are noted as experimental.

Key Capabilities

  • Generalized Instruction Following: The merging of multiple instruction-tuned models aims to enhance the model's ability to understand and respond to a wide variety of instruction styles.
  • Diverse Specializations: By incorporating models like Code-13B, Python-Code-13B, and MetaMath-13B-V1.0, the merge likely inherits capabilities in areas such as code generation, mathematical reasoning, and logical problem-solving.
  • Conversational and Creative: The inclusion of models like Nous-Hermes-Llama2-13b, Synthia-13B, and MythoLogic-L2-13b suggests improved performance in conversational AI, creative writing, and role-playing scenarios.

Good For

  • Versatile Instruction-Following Tasks: Ideal for applications requiring a model that can handle a broad spectrum of prompts and instructions, adapting to different conversational and task-oriented needs.
  • Exploratory AI Development: Suitable for developers looking for a robust 13B model that combines the strengths of multiple specialized Llama-2 variants, offering a generalized yet capable base for further fine-tuning or application development.