Inv/Konstanta-7B

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Feb 3, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Inv/Konstanta-7B is a 7 billion parameter language model created by Inv, formed by merging SanjiWatsuki/Kunoichi-DPO-v2-7B, maywell/PiVoT-0.1-Evil-a, and mlabonne/NeuralOmniBeagle-7B-v2 using the dare_ties method. This model, with a 4096-token context length, is a test merge designed to improve performance by combining models known for strong results. It achieves an average score of 73.54 on the Open LLM Leaderboard, demonstrating capabilities across reasoning, common sense, and question answering tasks.

Loading preview...

Konstanta-7B Overview

Konstanta-7B is a 7 billion parameter language model developed by Inv, created through a merge of three distinct models: SanjiWatsuki/Kunoichi-DPO-v2-7B, maywell/PiVoT-0.1-Evil-a, and mlabonne/NeuralOmniBeagle-7B-v2. This merge was executed using the dare_ties method within LazyMergekit, aiming to combine the strengths of its constituent models, particularly focusing on enhancing the performance of the Kunoichi base model.

Key Capabilities & Performance

Konstanta-7B demonstrates solid performance across various benchmarks, as evaluated on the Open LLM Leaderboard. It achieves an average score of 73.54, with notable results in:

  • AI2 Reasoning Challenge (25-Shot): 70.05
  • HellaSwag (10-Shot): 87.50
  • MMLU (5-Shot): 65.06
  • TruthfulQA (0-shot): 65.43
  • Winogrande (5-shot): 82.16
  • GSM8k (5-shot): 71.04

These scores indicate its proficiency in reasoning, common sense, and general knowledge tasks. The model operates with a context length of 4096 tokens.

Intended Use

This model is primarily a test merge designed to explore performance improvements through model combination. While its name has Russian origins, the model is not specifically optimized for Russian language use. Developers can integrate Konstanta-7B using standard Hugging Face transformers pipelines for text generation tasks.