beberik/Lonepino-11B

TEXT GENERATIONConcurrency Cost:1Model Size:10.7BQuant:FP8Ctx Length:4kPublished:Jan 8, 2024License:cc-by-nc-4.0Architecture:Transformer0.0K Open Weights Cold

beberik/Lonepino-11B is a 10.7 billion parameter language model with a 4096-token context length, created by merging several existing models including Intel/neural-chat-7b-v3-3-Slerp and NeverSleep/Noromaid-7b-v0.2. This model is a product of a multi-stage merge using mergekit, combining different layers and models to achieve its final composition. It is designed as a general-purpose conversational model, suitable for various text generation tasks.

Loading preview...

Lonepino-11B: A Merged Language Model

Lonepino-11B is a 10.7 billion parameter language model developed by beberik, constructed through a sophisticated multi-stage merging process using mergekit. This model integrates components from several established language models to combine their strengths.

Key Capabilities & Composition

  • Architecture: A blend of Intel/neural-chat-7b-v3-3-Slerp, NeverSleep/Noromaid-7b-v0.2, chargoddard/loyal-piano-m7-cdpo, and maywell/PiVoT-0.1-Starling-LM-RP.
  • Merging Strategy: Utilizes a layered merging approach, first creating intermediate merges like "neural-maid-11B" and "loyal-PiVoT-11B" before a final slerp merge to form Lonepino-11B.
  • Context Length: Supports a context window of 4096 tokens.
  • Performance: Achieves an average score of 70.10 on the Open LLM Leaderboard, with specific scores including 68.26 on AI2 Reasoning Challenge and 63.76 on MMLU.

Prompting & Usage

  • Flexible Prompting: Compatible with common prompt templates such as Alpaca or ChatML, allowing for versatile application.
  • General Purpose: Positioned as a normal, general-purpose model, suitable for a wide range of text generation and conversational AI tasks.

Good for

  • Developers experimenting with merged models and their performance characteristics.
  • General text generation and conversational applications where a 10.7B parameter model is appropriate.
  • Use cases requiring a model built from a diverse set of base models.