Undi95/ReMM-L2-13B-v1

TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kPublished:Sep 1, 2023License:cc-by-nc-4.0Architecture:Transformer0.0K Open Weights Cold

Undi95/ReMM-L2-13B-v1 is a 13 billion parameter language model, a recreation of the original MythoMax-L2-13b, built upon the Llama-2-13B base. This model is a merge of several updated instruction-tuned models, including Chronos-Beluga-v2, Airoboros-L2, Nous-Hermes, and Huginn-13b. It is designed for general-purpose conversational AI and instruction following, demonstrating competitive performance across various benchmarks.

Loading preview...

Overview

Undi95/ReMM-L2-13B-v1 is a 13 billion parameter model that serves as a recreation and update of the original MythoMax-L2-13b. It is built on the Llama-2-13B base and incorporates a sophisticated merging strategy, combining several high-performing instruction-tuned models to enhance its capabilities.

Key Capabilities & Architecture

This model is the result of a multi-stage TIES-merging process, integrating:

  • The-Face-Of-Goonery/Chronos-Beluga-v2-13bfp16
  • jondurbin/airoboros-l2-13b-2.1
  • NousResearch/Nous-Hermes-Llama2-13b
  • The-Face-Of-Goonery/Huginn-13b-v1.2

This merging approach aims to consolidate the strengths of these diverse models, resulting in a robust general-purpose language model. It utilizes the Alpaca prompt template for instruction following.

Performance Highlights

Evaluated on the Open LLM Leaderboard, ReMM-L2-13B-v1 demonstrates solid performance for its size:

  • Avg. Score: 52.58
  • ARC (25-shot): 59.73
  • HellaSwag (10-shot): 83.1
  • MMLU (5-shot): 54.11
  • Winogrande (5-shot): 74.51

While its GSM8K (2.96) and DROP (43.7) scores indicate areas for improvement in complex reasoning and reading comprehension, its overall average suggests a capable model for a wide range of instruction-following tasks.

When to Use This Model

  • General-purpose instruction following: Suitable for tasks requiring conversational responses, summarization, and creative text generation.
  • Experimentation with merged models: Ideal for developers interested in the performance characteristics of models created through advanced merging techniques.
  • Resource-constrained environments: As a 13B parameter model, it offers a balance between performance and computational requirements compared to larger models.