unsloth/Llama-3.2-3B-Instruct
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3.2BQuant:BF16Ctx Length:32kPublished:Sep 25, 2024License:llama3.2Architecture:Transformer0.1K Warm

unsloth/Llama-3.2-3B-Instruct is a 3.2 billion parameter instruction-tuned generative language model developed by Meta, optimized for multilingual dialogue use cases. This model, part of the Llama 3.2 collection, excels at agentic retrieval and summarization tasks. It features an optimized transformer architecture and supports a context length of 32768 tokens. Unsloth provides optimized versions of this model, enabling 2.4x faster fine-tuning with 58% less memory.

Loading preview...

unsloth/Llama-3.2-3B-Instruct Overview

This model is an instruction-tuned variant of Meta's Llama 3.2, a 3.2 billion parameter multilingual large language model. It is specifically optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. The model utilizes an optimized transformer architecture and has been fine-tuned using supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.

Key Capabilities & Features

  • Multilingual Support: Officially supports English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai, with training on a broader set of languages.
  • Optimized for Dialogue: Designed for conversational applications, retrieval, and summarization.
  • Efficient Fine-tuning: Unsloth provides versions of this model that enable fine-tuning up to 2.4x faster with 58% less memory compared to standard methods.
  • Context Length: Supports a substantial context length of 32768 tokens.
  • Architecture: Based on an auto-regressive language model with an optimized transformer architecture, featuring Grouped-Query Attention (GQA) for improved inference scalability.

When to Use This Model

This model is particularly well-suited for developers looking to build applications requiring efficient, multilingual dialogue capabilities, especially in scenarios involving agentic retrieval or summarization. Its optimization for faster and more memory-efficient fine-tuning via Unsloth makes it an attractive option for resource-constrained environments or rapid prototyping.