rinnic/llama3_2_3B-practice-area-ft-125k-1epochs

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3.2BQuant:BF16Ctx Length:32kPublished:Aug 27, 2025License:llama3.2Architecture:Transformer Warm

The rinnic/llama3_2_3B-practice-area-ft-125k-1epochs model is a 3.2 billion parameter Llama 3.2 family language model developed by Meta. It is an instruction-tuned, multilingual text-only model optimized for dialogue use cases such as agentic retrieval and summarization. This model features an optimized transformer architecture, Grouped-Query Attention (GQA), and supports a 32768 token context length, making it suitable for applications requiring efficient multilingual processing.

Loading preview...

Model Overview

This model, rinnic/llama3_2_3B-practice-area-ft-125k-1epochs, is a 3.2 billion parameter variant from Meta's Llama 3.2 family. It is an instruction-tuned, multilingual text-only model built on an optimized transformer architecture with Grouped-Query Attention (GQA) for improved inference scalability. The model was trained on up to 9 trillion tokens of publicly available data with a knowledge cutoff of December 2023, and incorporates knowledge distillation from larger Llama 3.1 models during pretraining.

Key Capabilities

  • Multilingual Dialogue: Optimized for multilingual chat and agentic applications, including retrieval and summarization tasks.
  • Quantization Support: Features various quantization schemes (SpinQuant, QAT + LoRA) designed for efficient deployment in constrained environments like mobile devices, significantly reducing model size and improving inference speed.
  • Long Context: Supports a context length of 32768 tokens, enabling processing of extensive inputs.
  • Safety Alignment: Developed with a focus on responsible AI, incorporating supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) for alignment with human preferences for helpfulness and safety.

Good for

  • Assistant-like Chatbots: Excels in creating interactive, assistant-like conversational agents.
  • Agentic Applications: Ideal for tasks involving knowledge retrieval and summarization.
  • Mobile AI: Quantized versions are specifically designed for on-device use cases with limited compute resources.
  • Multilingual Deployments: Officially supports English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai, with potential for fine-tuning in other languages.