Weyaxi/2x-LoRA-Assemble-13B

TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kPublished:Sep 24, 2023License:cc-by-nc-4.0Architecture:Transformer Open Weights Cold

Weyaxi/2x-LoRA-Assemble-13B is a 13 billion parameter language model based on the Llama 2 architecture, created by merging two instances of oh-yeontaek/llama-2-13B-LoRA-assemble. This model achieves an average score of 51.52 on the Open LLM Leaderboard, demonstrating capabilities across various benchmarks including ARC, HellaSwag, and MMLU. Its unique construction via a 'ties merge' of identical LoRA-assembled models results in a slightly improved performance score. It is suitable for general language understanding and generation tasks where a 13B parameter model is appropriate.

Loading preview...

Model Overview

Weyaxi/2x-LoRA-Assemble-13B is a 13 billion parameter language model built upon the Llama 2 architecture. Its distinctive characteristic lies in its creation: it is a 'ties merge' of two instances of the oh-yeontaek/llama-2-13B-LoRA-assemble model. This merging strategy, though described as an accidental discovery, reportedly yielded a marginal performance improvement of 0.01 points on evaluation metrics.

Performance Benchmarks

The model's capabilities have been evaluated on the Open LLM Leaderboard, achieving an average score of 51.52. Key benchmark results include:

  • ARC (25-shot): 63.65
  • HellaSwag (10-shot): 83.47
  • MMLU (5-shot): 59.82
  • TruthfulQA (0-shot): 55.94
  • Winogrande (5-shot): 76.48
  • GSM8K (5-shot): 9.25
  • DROP (3-shot): 12.01

Unique Construction

The model's weights and densities were combined from two identical base models, with specific ratios applied during the merge process. This unconventional merging approach is the primary differentiator for this model, suggesting potential for exploring novel model combination strategies.

Use Cases

Given its Llama 2 foundation and general-purpose benchmark scores, this model is suitable for a range of natural language processing tasks, including text generation, question answering, and general conversational AI applications, particularly where a 13B parameter model fits computational constraints.