Azazelle/xDAN-SlimOrca

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Dec 29, 2023License:cc-by-4.0Architecture:Transformer Open Weights Cold

Azazelle/xDAN-SlimOrca is a 7 billion parameter language model created by Azazelle, formed by a slerp merge of xDAN-L1-Chat-RL-v1 and mistral-7b-slimorcaboros, based on the Mistral-7B-v0.1 architecture. This model is designed for general conversational tasks, leveraging its merged base models to achieve a balanced performance across various benchmarks. It features a 4096-token context length and demonstrates an average score of 68.04 on the Open LLM Leaderboard.

Loading preview...

xDAN-SlimOrca: A Merged 7B Language Model

xDAN-SlimOrca is a 7 billion parameter language model developed by Azazelle, created through a slerp merge of two distinct models: xDAN-L1-Chat-RL-v1 and mistral-7b-slimorcaboros. This merging technique, specified in the provided mergekit YAML configuration, allows for a nuanced combination of the strengths of its constituent models, with specific t values applied to different layers (self_attn, mlp) and a fallback for other tensors.

Key Capabilities & Performance

Based on the Mistral-7B-v0.1 architecture, xDAN-SlimOrca is designed for general-purpose language understanding and generation. Its performance has been evaluated on the Open LLM Leaderboard, achieving an overall average score of 68.04. Specific benchmark results include:

  • AI2 Reasoning Challenge (25-Shot): 65.61
  • HellaSwag (10-Shot): 85.70
  • MMLU (5-Shot): 63.67
  • TruthfulQA (0-shot): 57.68
  • Winogrande (5-shot): 77.66
  • GSM8k (5-shot): 57.92

These scores indicate a balanced capability across reasoning, common sense, and factual recall tasks. The model's 4096-token context length supports processing moderately long inputs and generating coherent responses.

Good For

  • General conversational AI applications requiring a 7B parameter model.
  • Tasks benefiting from a blend of capabilities derived from its merged base models.
  • Developers seeking a Mistral-based model with specific fine-tuning characteristics from the merged components.