context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3.2BQuant:BF16Ctx Length:32kPublished:Feb 22, 2025License:llama3.2Architecture:Transformer0.0K Warm

The Llama 3.2-3B-Instruct-FP16 model, developed by Meta, is a 3.21 billion parameter instruction-tuned multilingual large language model with a 32768 token context length. Optimized for multilingual dialogue, it excels in agentic retrieval, summarization, and chat applications. This model utilizes an optimized transformer architecture with Grouped-Query Attention (GQA) and is fine-tuned using SFT and RLHF for helpfulness and safety, outperforming many open-source and closed chat models on industry benchmarks.

Loading preview...

Llama 3.2-3B-Instruct-FP16: Multilingual Dialogue and Agentic AI

Llama 3.2-3B-Instruct-FP16, developed by Meta, is a 3.21 billion parameter instruction-tuned model from the Llama 3.2 family. It is specifically optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. This model leverages an optimized transformer architecture with Grouped-Query Attention (GQA) for enhanced inference scalability and was trained on up to 9 trillion tokens of publicly available data, with a knowledge cutoff of December 2023.

Key Capabilities

  • Multilingual Performance: Officially supports English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai, with broader training across other languages.
  • Dialogue Optimization: Instruction-tuned for assistant-like chat and agentic applications such as knowledge retrieval, summarization, mobile AI-powered writing assistants, and query/prompt rewriting.
  • Quantization Support: Designed with quantization schemes (SpinQuant, QLoRA) for efficient deployment in constrained environments like mobile devices, significantly reducing model size and improving inference speed.
  • Robust Safety Alignment: Utilizes supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety, incorporating extensive red teaming and safety mitigations.

Good for

  • Multilingual Chatbots: Building conversational AI agents that can interact effectively across multiple supported languages.
  • Agentic Applications: Developing systems for knowledge retrieval, document summarization, and intelligent prompt rewriting.
  • On-Device AI: Deploying AI capabilities on mobile devices or other environments with limited compute resources, thanks to its optimized quantization methods.
  • Research and Development: Serving as a valuable resource for studying safety fine-tuning and developing new natural language generation tasks.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p