invalid-coder/SOLAR-10.7B-Instruct-SOLARC-M-10.7B-slerp
TEXT GENERATIONConcurrency Cost:1Model Size:10.7BQuant:FP8Ctx Length:4kPublished:Jan 10, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

invalid-coder/SOLAR-10.7B-Instruct-SOLARC-M-10.7B-slerp is a 10.7 billion parameter language model created by invalid-coder through a slerp merge of upstage/SOLAR-10.7B-Instruct-v1.0 and DopeorNope/SOLARC-M-10.7B. This model leverages the strengths of its base components, offering a balanced performance profile for general instruction-following tasks. Its 4096-token context length supports a variety of conversational and text generation applications.

Loading preview...

Model Overview

invalid-coder/SOLAR-10.7B-Instruct-SOLARC-M-10.7B-slerp is a 10.7 billion parameter language model developed by invalid-coder. This model is a product of a slerp merge (Spherical Linear Interpolation) combining two distinct base models:

  • upstage/SOLAR-10.7B-Instruct-v1.0
  • DopeorNope/SOLARC-M-10.7B

This merging technique aims to combine the beneficial characteristics of both parent models, potentially leading to improved performance across various tasks without additional training.

Key Characteristics

  • Parameter Count: 10.7 billion parameters, offering a balance between computational efficiency and capability.
  • Merge Method: Utilizes the slerp (Spherical Linear Interpolation) method, which is a common technique for smoothly blending the weights of different models.
  • Configuration: The merge configuration specifies how different layers and components (like self_attn and mlp) from the source models are weighted during the interpolation process.
  • Context Length: Supports a context window of 4096 tokens, suitable for handling moderately long inputs and generating coherent responses.

Use Cases

This model is designed for general instruction-following and text generation tasks. Developers can integrate it into applications requiring:

  • Conversational AI
  • Content creation
  • Summarization
  • Question answering

Its merged architecture suggests a versatile performance profile, making it a suitable choice for a broad range of natural language processing applications.