gmonsoon/SahabatAI-Lion-9B-TIES-v1

TEXT GENERATIONConcurrency Cost:1Model Size:9BQuant:FP8Ctx Length:16kPublished:Nov 16, 2024License:gemmaArchitecture:Transformer0.0K Cold

gmonsoon/SahabatAI-Lion-9B-TIES-v1 is a 9 billion parameter language model created by gmonsoon, built by merging two Gemma2-9B-based instruction-tuned models using the TIES method. This model is optimized for general instruction following and achieves strong performance, ranking as a top model under 10B parameters on the Hugging Face Open LLM Leaderboard. It is suitable for a wide range of natural language processing tasks requiring robust instruction adherence.

Loading preview...

SahabatAI-Lion-9B-TIES-v1 Overview

gmonsoon/SahabatAI-Lion-9B-TIES-v1 is a 9 billion parameter instruction-tuned language model developed by gmonsoon. It was created by merging two distinct Gemma2-9B-based instruction-tuned models: GoToCompany/gemma2-9b-cpt-sahabatai-v1-instruct and aisingapore/gemma2-9b-cpt-sea-lionv3-instruct, utilizing the TIES (Trimmed, Iterative, and Selective) merging method. This approach aims to combine the strengths of its constituent models to achieve enhanced performance.

Key Capabilities & Performance

  • Optimized Merging: Leverages the TIES method, which research suggests can lead to improved outputs when merging fine-tuned models with their base models.
  • Strong Leaderboard Performance: As of November 2024, this model ranks as the third-best model overall and the top Gemma2-9B based model on the Hugging Face Open LLM Leaderboard for models under 10 billion parameters (excluding Merge/MoE models).
  • Instruction Following: Designed for general instruction-following tasks, making it versatile for various NLP applications.

Benchmarks

Evaluated on the Open LLM Leaderboard, SahabatAI-Lion-9B-TIES-v1 demonstrates competitive results:

  • Average Score: 33.70
  • IFEval (0-Shot): 73.78
  • BBH (3-Shot): 43.40
  • MMLU-PRO (5-shot): 37.19

Good For

  • Developers seeking a high-performing 9B parameter model for general instruction-following tasks.
  • Applications requiring a balance of performance and efficiency within the sub-10B parameter range.
  • Experimentation with models built using advanced merging techniques like TIES.