johnsutor/mixture-of-llamas-ties

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kLicense:apache-2.0Architecture:Transformer Open Weights Warm

The johnsutor/mixture-of-llamas-ties model is an 8 billion parameter instruction-tuned language model created by johnsutor, built upon the Meta-Llama-3-8B-Instruct base. This model was developed using the TIES merge method, combining several specialized Llama-3-8B-Instruct variants. It is designed to leverage the strengths of its constituent models, offering a versatile instruction-following capability with an 8192-token context length.

Loading preview...

Overview

The johnsutor/mixture-of-llamas-ties model is an 8 billion parameter instruction-tuned language model. It was created by johnsutor using the TIES merge method via mergekit, with meta-llama/Meta-Llama-3-8B-Instruct serving as its foundational base model.

Merge Details

This model is a composite of several specialized Llama-3-8B-Instruct variants, each contributing to its overall capabilities. The merge process involved combining the following models:

  • VAGOsolutions/Llama-3-SauerkrautLM-8b-Instruct
  • DeepMount00/Llama-3-8b-Ita
  • failspy/Meta-Llama-3-8B-Instruct-abliterated-v3
  • jpacifico/French-Alpaca-Llama3-8B-Instruct-v1.0
  • nbeerbower/llama-3-gutenberg-8B

The TIES merge method was applied with specific density and weight parameters for each constituent model, aiming to integrate their distinct characteristics effectively. The tokenizer source was unified, and the model was configured to use bfloat16 data type with int8_mask enabled.

Key Characteristics

  • Architecture: Based on the Llama-3-8B-Instruct family.
  • Parameter Count: 8 billion parameters.
  • Context Length: Supports an 8192-token context window.
  • Merge Method: Utilizes the TIES (Trimmed-mean Ensemble of Sub-networks) method for combining models.

Potential Use Cases

Given its foundation in multiple Llama-3-8B-Instruct derivatives, this model is likely suitable for a range of instruction-following tasks, potentially benefiting from the diverse specializations of its merged components.