mlabonne/Llama-3-SLERP-8B
Llama-3-SLERP-8B is an 8 billion parameter language model created by mlabonne, merging Meta-Llama-3-8B and Meta-Llama-3-8B-Instruct using a SLERP merge method. This model combines the base Llama 3 capabilities with instruction-tuned performance, offering a balanced foundation for various generative AI tasks. It leverages a specific parameter weighting for self-attention and MLP layers to optimize its merged characteristics.
Loading preview...
Overview
Llama-3-SLERP-8B is an 8 billion parameter language model developed by mlabonne. It is a product of merging two foundational Meta Llama 3 models: the base Meta-Llama-3-8B and its instruction-tuned counterpart, Meta-Llama-3-8B-Instruct. This merge was performed using the SLERP (Spherical Linear Interpolation) method, a technique often employed to combine the strengths of different models while maintaining coherence.
Key Characteristics
- Model Architecture: Based on the Llama 3 family, providing a robust and widely recognized foundation.
- Parameter Count: 8 billion parameters, offering a balance between performance and computational efficiency.
- Merge Method: Utilizes SLERP, specifically applying different interpolation weights (
tvalues) to the self-attention and MLP layers. This fine-grained control over the merge process aims to selectively blend features from the base and instruction-tuned models. - Context Length: Supports an 8192-token context window, suitable for handling moderately long inputs and generating comprehensive responses.
Good For
- General-purpose text generation: Benefits from the combined capabilities of a strong base model and an instruction-tuned variant.
- Applications requiring a blend of foundational knowledge and instruction following: The SLERP merge is designed to integrate both aspects effectively.
- Developers looking for a Llama 3 variant with optimized instruction adherence: The specific merge configuration targets improved instruction-following without sacrificing the base model's knowledge.