martyn/llama2-megamerge-dare-13b-v1
martyn/llama2-megamerge-dare-13b-v1 is a 13 billion parameter Llama 2-based model created by martyn, resulting from a merge of nine specialized 13B models. This merge combines capabilities from models focused on code generation, mathematical reasoning, and instruction following. It is designed to offer a versatile foundation for tasks requiring a blend of these advanced functionalities.
Loading preview...
Model Overview
martyn/llama2-megamerge-dare-13b-v1 is a 13 billion parameter language model built upon the Llama 2 architecture. This model is a "megamerge" of nine distinct 13B models, including specialized variants for code, mathematics, and general instruction following. The merge was performed using specific hyperparameters (p=0.1 and lambda=2) via the safetensors-merge-supermario tool.
Key Capabilities
This merged model integrates the strengths of its constituent parts, which include:
- Code Generation: Incorporates capabilities from
ajibawa-2023/Code-13Bandajibawa-2023/Python-Code-13B. - Mathematical Reasoning: Benefits from the
meta-math/MetaMath-13B-V1.0component. - Instruction Following & Chat: Leverages models like
migtissera/Synthia-13B,FPHam/Sydney_Overthinker_13b_HF,allenai/tulu-2-dpo-13b,Doctor-Shotgun/cat-v1.0-13b, andNeverSleep/Noromaid-13b-v0.1.1for enhanced conversational and instruction-based performance.
Good For
This model is suitable for use cases that require a combination of:
- Multi-domain problem-solving: Tasks that span coding, mathematical logic, and general language understanding.
- Versatile AI applications: Where a single model needs to handle diverse types of prompts and instructions effectively.
- Exploration of merged model performance: For developers interested in the synergistic effects of combining multiple specialized models.