shadowml/Marcoro14-7B-slerp

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Dec 30, 2023License:apache-2.0Architecture:Transformer Open Weights Cold

shadowml/Marcoro14-7B-slerp is a 7 billion parameter language model created by shadowml, built using a slerp merge of AIDC-ai-business/Marcoroni-7B-v3 and EmbeddedLLM/Mistral-7B-Merge-14-v0.1. This model leverages the strengths of its constituent models through a specific merging strategy, offering a general-purpose language understanding and generation capability within a 4096-token context window. Its unique composition aims to provide a balanced performance profile for various text-based tasks.

Loading preview...

Model Overview

Marcoro14-7B-slerp is a 7 billion parameter language model developed by shadowml. It is a product of a slerp merge (spherical linear interpolation) using mergekit, combining two distinct base models: AIDC-ai-business/Marcoroni-7B-v3 and EmbeddedLLM/Mistral-7B-Merge-14-v0.1. This merging technique allows for a nuanced blend of the characteristics and capabilities of its source models.

Key Characteristics

  • Merged Architecture: Utilizes a slerp merge method, specifically applying different interpolation values (t) to self-attention and MLP layers, indicating a tailored approach to combining model components.
  • Base Models: Built upon the foundations of Marcoroni-7B-v3 and Mistral-7B-Merge-14-v0.1, suggesting a blend of their respective strengths in language understanding and generation.
  • Parameter Count: Operates with 7 billion parameters, offering a balance between performance and computational efficiency.
  • Context Length: Supports a context window of 4096 tokens, suitable for processing moderately long texts.

Potential Use Cases

This model is designed for general-purpose applications where a blend of capabilities from its constituent models is beneficial. It can be considered for tasks requiring:

  • Text generation and completion.
  • Summarization and information extraction.
  • Conversational AI and chatbots.
  • Exploration of merged model performance for specific tasks.