mlabonne/Llama-3-12B

TEXT GENERATIONConcurrency Cost:1Model Size:15BQuant:FP8Ctx Length:8kTool Calling:SupportedPublished:Apr 18, 2024License:otherArchitecture:Transformer0.0K Cold

Llama-3-12B by mlabonne is a 15 billion parameter language model created by merging two instances of Meta's Llama-3-8B using a passthrough merge method. This model leverages the Llama-3 architecture to provide enhanced capabilities, making it suitable for general-purpose text generation and understanding tasks. Its unique merge configuration aims to combine the strengths of its base models for improved performance.

Loading preview...

Overview

Llama-3-12B is a 15 billion parameter language model developed by mlabonne. It is constructed through a unique merge of two instances of Meta's Llama-3-8B model, utilizing a passthrough merge method via LazyMergekit. This approach combines different layer ranges from the base models to create a new, larger model.

Key Characteristics

  • Architecture: Based on the robust Llama-3 family from Meta.
  • Parameter Count: 15 billion parameters, offering a balance between performance and computational requirements.
  • Merge Method: Employs a passthrough merge, specifically combining layer ranges [0, 24] from one Llama-3-8B instance and [8, 32] from another.
  • Data Type: Configured to use bfloat16 for efficient processing.

Potential Use Cases

Given its Llama-3 foundation and increased parameter count through merging, Llama-3-12B is well-suited for a variety of natural language processing tasks, including:

  • General text generation and completion.
  • Question answering and summarization.
  • Chatbot development and conversational AI.
  • Exploration of merged model architectures for enhanced performance.