Overview
The johnsutor/mixture-of-llamas-ties model is an 8 billion parameter instruction-tuned language model. It was created by johnsutor using the TIES merge method via mergekit, with meta-llama/Meta-Llama-3-8B-Instruct serving as its foundational base model.
Merge Details
This model is a composite of several specialized Llama-3-8B-Instruct variants, each contributing to its overall capabilities. The merge process involved combining the following models:
VAGOsolutions/Llama-3-SauerkrautLM-8b-InstructDeepMount00/Llama-3-8b-Itafailspy/Meta-Llama-3-8B-Instruct-abliterated-v3jpacifico/French-Alpaca-Llama3-8B-Instruct-v1.0nbeerbower/llama-3-gutenberg-8B
The TIES merge method was applied with specific density and weight parameters for each constituent model, aiming to integrate their distinct characteristics effectively. The tokenizer source was unified, and the model was configured to use bfloat16 data type with int8_mask enabled.
Key Characteristics
- Architecture: Based on the Llama-3-8B-Instruct family.
- Parameter Count: 8 billion parameters.
- Context Length: Supports an 8192-token context window.
- Merge Method: Utilizes the TIES (Trimmed-mean Ensemble of Sub-networks) method for combining models.
Potential Use Cases
Given its foundation in multiple Llama-3-8B-Instruct derivatives, this model is likely suitable for a range of instruction-following tasks, potentially benefiting from the diverse specializations of its merged components.