invalid-coder/Sakura-SOLAR-Instruct-CarbonVillain-en-10.7B-v2-slerp
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:10.7BQuant:FP8Ctx Length:4kPublished:Jan 10, 2024License:apache-2.0Architecture:Transformer Open Weights Warm

invalid-coder/Sakura-SOLAR-Instruct-CarbonVillain-en-10.7B-v2-slerp is a 10.7 billion parameter language model created by invalid-coder, formed by merging jeonsworld/CarbonVillain-en-10.7B-v2 and kyujinpy/Sakura-SOLAR-Instruct using the slerp method. This model leverages the strengths of its constituent models, offering a balanced performance profile for general instruction-following tasks. It is designed for developers seeking a merged model with a 4096 token context length for diverse applications.

Loading preview...

Model Overview

invalid-coder/Sakura-SOLAR-Instruct-CarbonVillain-en-10.7B-v2-slerp is a 10.7 billion parameter language model developed by invalid-coder. This model is a product of a merge operation, specifically using the slerp (spherical linear interpolation) method, combining two distinct base models:

  • jeonsworld/CarbonVillain-en-10.7B-v2
  • kyujinpy/Sakura-SOLAR-Instruct

This merging approach aims to synthesize the capabilities of both parent models, potentially leading to a more robust and versatile instruction-following model. The merge configuration specifies distinct t values for self-attention and MLP layers, indicating a fine-tuned blending strategy.

Key Capabilities

  • Instruction Following: Inherits instruction-tuned capabilities from its base models.
  • Merged Architecture: Benefits from the combined strengths of CarbonVillain-en-10.7B-v2 and Sakura-SOLAR-Instruct.
  • Standard Context Window: Supports a context length of 4096 tokens, suitable for a range of conversational and text generation tasks.

Good For

  • General Text Generation: Creating coherent and contextually relevant text based on prompts.
  • Instruction-Based Tasks: Responding to user instructions and queries effectively.
  • Experimentation: Developers interested in exploring the performance characteristics of slerp-merged models.