arcee-ai/Llama-3-Base-Instruct-Slerp

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Apr 18, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The arcee-ai/Llama-3-Base-Instruct-Slerp is an 8 billion parameter language model created by arcee-ai, formed by merging Meta-Llama-3-8B and Meta-Llama-3-8B-Instruct using a slerp method. This model combines the base Llama 3 capabilities with instruction-following fine-tuning, offering a balanced performance for general conversational AI tasks. It leverages a context length of 8192 tokens, making it suitable for applications requiring moderate context understanding and generation.

Loading preview...

Model Overview

The arcee-ai/Llama-3-Base-Instruct-Slerp is an 8 billion parameter language model developed by arcee-ai. It is a merged model, combining the strengths of two foundational Meta Llama 3 models: meta-llama/Meta-Llama-3-8B and meta-llama/Meta-Llama-3-8B-Instruct. This merge was performed using the slerp (spherical linear interpolation) method via mergekit, aiming to create a model that benefits from both the raw capabilities of the base model and the instruction-following prowess of the instruct-tuned variant.

Key Characteristics

  • Architecture: Based on the Llama 3 family, specifically the 8B parameter variant.
  • Merging Method: Utilizes slerp for combining model weights, with specific t parameters applied to different layers (self-attention, MLP) to fine-tune the merge outcome.
  • Base Models: Integrates Meta-Llama-3-8B for foundational language understanding and Meta-Llama-3-8B-Instruct for enhanced instruction following.
  • Context Length: Supports an 8192-token context window.

Use Cases

This model is well-suited for applications requiring a balance between general language understanding and the ability to follow instructions effectively. It can be used for:

  • General-purpose conversational agents.
  • Text generation tasks where instruction adherence is important.
  • Applications benefiting from the combined strengths of a base and an instruct-tuned model without the overhead of larger models.