dddsaty/Merge_Sakura_Solar

TEXT GENERATIONConcurrency Cost:1Model Size:10.7BQuant:FP8Ctx Length:4kPublished:Feb 7, 2024License:cc-by-nc-sa-4.0Architecture:Transformer0.0K Open Weights Cold

dddsaty/Merge_Sakura_Solar is a 10.7 billion parameter language model merged from three Sakura-SOLAR-Instruct variants, including one specifically fine-tuned for mathematical tasks. This model leverages the SOLAR architecture and is designed for general instruction-following, with a notable strength in mathematical reasoning. It offers a 4096 token context length and is suitable for applications requiring robust performance across various benchmarks.

Loading preview...

Model Overview

dddsaty/Merge_Sakura_Solar is a 10.7 billion parameter language model created by merging three distinct models from the kyujinpy collection using mergekit. The base models include:

  • Sakura-SOLAR-Instruct
  • Sakura-SOLRCA-Math-Instruct-DPO-v2 (specifically optimized for mathematical reasoning)
  • Sakura-SOLRCA-Instruct-DPO

This merge aims to combine the strengths of these instruction-tuned models, particularly enhancing its capabilities in mathematical problem-solving while maintaining strong general instruction-following.

Performance Benchmarks

The model demonstrates competitive performance across several benchmarks, with an average score of 74.03. Key scores include:

  • ARC: 70.73
  • HellaSwag: 88.51
  • MMLU: 66.03
  • TruthfulQA: 72.21
  • Winogrande: 82.72
  • GSM8K: 63.99

Key Capabilities

  • General Instruction Following: Designed to respond effectively to a wide range of prompts and instructions.
  • Enhanced Mathematical Reasoning: Benefits from the inclusion of a math-specific fine-tune, making it more capable in numerical and logical tasks.

When to Use This Model

This model is a strong candidate for applications requiring a balanced instruction-following model with a particular emphasis on mathematical accuracy. Its 10.7B parameter count makes it a powerful option for tasks where the performance of smaller models is insufficient, and its 4096 token context length supports moderately complex interactions.