vicgalle/SOLAR-13B-Instruct-v1.0

TEXT GENERATIONConcurrency Cost:1Model Size:15BQuant:FP8Ctx Length:8kTool Calling:SupportedPublished:Jan 13, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

vicgalle/SOLAR-13B-Instruct-v1.0 is a 15 billion parameter instruction-tuned language model, upscaled from SOLAR-10.7B and created by vicgalle using a passthrough merge method. This model is specifically optimized for VRAM usage, allowing a 4-bit quantization to fit within 12GB GPU cards. It demonstrates an average performance of 56.65 on the Open LLM Leaderboard, with notable scores in HellaSwag (78.03) and Winogrande (70.24).

Loading preview...

SOLAR-13B-Instruct-v1.0 Overview

vicgalle/SOLAR-13B-Instruct-v1.0 is a 15 billion parameter instruction-tuned language model, developed by vicgalle through a 'passthrough' merge of the upstage/SOLAR-10.7B-Instruct-v1.0 model. The primary motivation behind upscaling to 13B parameters was to optimize VRAM usage, enabling a 4-bit quantized version to fit comfortably within typical 12GB GPU cards.

Key Capabilities and Performance

This model is designed for general instruction-following tasks. While specific use cases are still being evaluated, its architecture suggests suitability for applications where efficient VRAM utilization is crucial. Performance metrics on the Open LLM Leaderboard indicate a balanced capability across various benchmarks:

  • Average Score: 56.65
  • AI2 Reasoning Challenge (25-Shot): 57.25
  • HellaSwag (10-Shot): 78.03
  • MMLU (5-Shot): 55.75
  • TruthfulQA (0-shot): 61.99
  • Winogrande (5-shot): 70.24
  • GSM8k (5-shot): 16.60

Prompt Format

The model utilizes the same prompt template as its base, SOLAR-10.7B:

<s> ### User:
{prompt}

### Assistant:
{response}</s>

When to Consider This Model

  • VRAM-constrained environments: Its optimization for 12GB GPUs makes it a strong candidate for local deployment on consumer-grade hardware.
  • General instruction-following: The model's instruction-tuned nature and leaderboard scores suggest it can handle a variety of common NLP tasks.
  • Experimentation with merged models: Users interested in exploring models created via mergekit's passthrough method might find this an interesting case study.