Sakalti/ultiima-72B

Warm
Public
72.7B
FP8
131072
Jan 9, 2025
License: other
Hugging Face
Overview

Sakalti/ultiima-72B: A Merged Qwen2.5 Model

Sakalti/ultiima-72B is a 72.7 billion parameter language model built upon the Qwen2.5 architecture. This model was created using the TIES merge method, combining the strengths of the base model Qwen/Qwen2.5-72B with the instruction-tuned variant Qwen/Qwen2.5-72B-Instruct.

Key Capabilities & Performance

This model demonstrates strong performance across a range of benchmarks, as evaluated on the Open LLM Leaderboard. Its average score is 46.58, with notable results in specific areas:

  • IFEval (0-Shot): 71.40
  • BBH (3-Shot): 61.10
  • MATH Lvl 5 (4-Shot): 52.42
  • MMLU-PRO (5-shot): 54.51

With a substantial context length of 131072 tokens, ultiima-72B is well-suited for tasks requiring extensive contextual understanding and generation.

Merge Details

The model was constructed using mergekit, specifically employing the TIES (Trimmed, Iterative, and Selective) merging technique. The primary component in this merge was Qwen/Qwen2.5-72B-Instruct, with Qwen/Qwen2.5-72B serving as the base model. This approach aims to consolidate and enhance the capabilities of its constituent models.

Good For

  • Applications requiring a large-scale, general-purpose language model.
  • Tasks benefiting from a long context window.
  • Scenarios where strong instruction following and reasoning capabilities are important.