martyn/llama2-megamerge-dare-13b-v1

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kPublished:Dec 12, 2023License:llama2Architecture:Transformer Open Weights Warm

martyn/llama2-megamerge-dare-13b-v1 is a 13 billion parameter Llama 2-based model created by martyn, resulting from a merge of nine specialized 13B models. This merge combines capabilities from models focused on code generation, mathematical reasoning, and instruction following. It is designed to offer a versatile foundation for tasks requiring a blend of these advanced functionalities.

Loading preview...

Model Overview

martyn/llama2-megamerge-dare-13b-v1 is a 13 billion parameter language model built upon the Llama 2 architecture. This model is a "megamerge" of nine distinct 13B models, including specialized variants for code, mathematics, and general instruction following. The merge was performed using specific hyperparameters (p=0.1 and lambda=2) via the safetensors-merge-supermario tool.

Key Capabilities

This merged model integrates the strengths of its constituent parts, which include:

  • Code Generation: Incorporates capabilities from ajibawa-2023/Code-13B and ajibawa-2023/Python-Code-13B.
  • Mathematical Reasoning: Benefits from the meta-math/MetaMath-13B-V1.0 component.
  • Instruction Following & Chat: Leverages models like migtissera/Synthia-13B, FPHam/Sydney_Overthinker_13b_HF, allenai/tulu-2-dpo-13b, Doctor-Shotgun/cat-v1.0-13b, and NeverSleep/Noromaid-13b-v0.1.1 for enhanced conversational and instruction-based performance.

Good For

This model is suitable for use cases that require a combination of:

  • Multi-domain problem-solving: Tasks that span coding, mathematical logic, and general language understanding.
  • Versatile AI applications: Where a single model needs to handle diverse types of prompts and instructions effectively.
  • Exploration of merged model performance: For developers interested in the synergistic effects of combining multiple specialized models.