meta-llama/Llama-3.2-1B-Instruct

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1BQuant:BF16Ctx Length:32kPublished:Sep 18, 2024License:llama3.2Architecture:Transformer1.4K Gated Warm

The Llama 3.2-1B-Instruct model by Meta is a 1.23 billion parameter instruction-tuned generative language model, part of the Llama 3.2 collection. Optimized for multilingual dialogue, it excels in agentic retrieval and summarization tasks across officially supported languages like English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. This model utilizes an optimized transformer architecture with supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) for alignment, and features a 32768 token context length.

Loading preview...

Overview

Meta's Llama 3.2-1B-Instruct is a 1.23 billion parameter instruction-tuned model from the Llama 3.2 family, designed for multilingual text-in/text-out generative tasks. It leverages an optimized transformer architecture and is aligned with human preferences through supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF). The model was pretrained on up to 9 trillion tokens of publicly available data, with a knowledge cutoff of December 2023, and incorporates knowledge distillation from larger Llama 3.1 models.

Key Capabilities

  • Multilingual Dialogue: Optimized for assistant-like chat and agentic applications in officially supported languages including English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
  • Long Context Handling: Features a substantial context length of 32768 tokens, enabling processing of extensive inputs.
  • Quantization Support: Designed with quantization schemes (SpinQuant, QLoRA) for efficient deployment in constrained environments, such as mobile devices, significantly improving inference speed and reducing memory footprint.
  • Safety Alignment: Developed with a strong focus on safety, incorporating extensive fine-tuning, red teaming, and safeguards to mitigate risks.

Intended Use Cases

  • Assistant-like Chat: Ideal for conversational AI applications requiring instruction following.
  • Agentic Applications: Suited for tasks like knowledge retrieval, summarization, and query/prompt rewriting.
  • On-Device Deployment: Quantized versions are specifically adapted for use cases with limited compute resources, such as mobile AI-powered writing assistants.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p