PASI1028/Llama-3.2-3B-Instruct

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3.2BQuant:BF16Ctx Length:32kPublished:Mar 1, 2025License:llama3.2Architecture:Transformer Warm

PASI1028/Llama-3.2-3B-Instruct is a 3.21 billion parameter instruction-tuned generative language model developed by Meta, part of the Llama 3.2 family. Optimized for multilingual dialogue use cases, it excels in agentic retrieval and summarization tasks. This model features an optimized transformer architecture with Grouped-Query Attention (GQA) and supports a 32K context length, trained on up to 9 trillion tokens of publicly available online data with a knowledge cutoff of December 2023. It is designed for commercial and research use, particularly in assistant-like chat and agentic applications.

Loading preview...

Overview

PASI1028/Llama-3.2-3B-Instruct is a 3.21 billion parameter instruction-tuned model from Meta's Llama 3.2 family, designed for multilingual text-in/text-out generation. It utilizes an optimized transformer architecture with Grouped-Query Attention (GQA) and has a context length of 32,768 tokens. The model was trained on up to 9 trillion tokens of publicly available data, incorporating knowledge distillation from larger Llama 3.1 models and aligned using Supervised Fine-Tuning (SFT), Rejection Sampling (RS), and Direct Preference Optimization (DPO).

Key Capabilities

  • Multilingual Dialogue: Optimized for assistant-like chat and agentic applications in multiple languages, including English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
  • Agentic Tasks: Strong performance in knowledge retrieval and summarization.
  • Quantization Support: Features various quantization schemes (SpinQuant, QLoRA) for efficient deployment in constrained environments, such as mobile devices, significantly reducing model size and improving inference speed.
  • Robust Safety: Developed with a comprehensive safety strategy, including extensive fine-tuning, adversarial evaluations, and red teaming to mitigate critical risks like CBRNE, child safety, and cyber attacks.

Intended Use Cases

This model is suitable for commercial and research applications requiring efficient, multilingual conversational AI. It is particularly well-suited for:

  • Assistant-like chat applications.
  • Agentic systems for knowledge retrieval and summarization.
  • Mobile AI-powered writing assistants.
  • Query and prompt rewriting.
  • On-device use cases with limited compute resources, especially with its quantized versions.