brgx53/3Blarenegv3-ECE-PRYMMAL-Martial

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Nov 8, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

brgx53/3Blarenegv3-ECE-PRYMMAL-Martial is a 7.6 billion parameter language model created by brgx53 using the SLERP merge method. This model combines fblgit/cybertron-v4-qw7B-MGS and Tsunami-th/Tsunami-0.5x-7B-Instruct, featuring a 131072 token context length. It is designed for general-purpose applications, demonstrating balanced performance across various benchmarks including IFEval, BBH, and MMLU-PRO.

Loading preview...

Overview

brgx53/3Blarenegv3-ECE-PRYMMAL-Martial is a 7.6 billion parameter language model developed by brgx53. It was created using the SLERP merge method, combining two base models: fblgit/cybertron-v4-qw7B-MGS and Tsunami-th/Tsunami-0.5x-7B-Instruct. The merge configuration specifically applied varying interpolation values across self-attention and MLP layers, with a general value of 0.5, and utilizes bfloat16 for its dtype.

Key Capabilities

This merged model demonstrates a balanced performance profile across several benchmarks, as evaluated on the Open LLM Leaderboard:

  • IFEval (0-Shot): Achieves 56.77, indicating proficiency in instruction following.
  • BBH (3-Shot): Scores 37.25, reflecting its ability in Big-Bench Hard tasks.
  • MATH Lvl 5 (4-Shot): Reaches 30.74, showing some capability in mathematical reasoning.
  • MMLU-PRO (5-shot): Scores 38.95, suggesting general knowledge and understanding across various subjects.

Good For

Given its balanced performance across diverse benchmarks, this model is suitable for:

  • General-purpose applications requiring a blend of instruction following, reasoning, and knowledge recall.
  • Experimentation with merged models for developers interested in the SLERP method and its outcomes.
  • Use cases where a 7.6 billion parameter model with a 131072 token context length fits the computational and performance requirements.