Jackrong/GPT-5-Distill-llama3.2-3B-Instruct

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3.2BQuant:BF16Ctx Length:32kPublished:Nov 29, 2025License:llama3.2Architecture:Transformer0.0K Warm

Jackrong/GPT-5-Distill-llama3.2-3B-Instruct is a 3.2 billion parameter instruction-tuned language model built on the Llama 3.2 architecture, optimized for edge and consumer GPU deployment. This model leverages knowledge distillation from GPT-5 responses to mimic superior reasoning and conversational patterns, offering flagship-level instruction following in a compact package. With a 32K token context window, it excels in on-device chat, reasoning, summarization, and RAG applications, particularly for moderate-sized documents.

Loading preview...

Overview

Jackrong/GPT-5-Distill-llama3.2-3B-Instruct is an instruction-tuned language model based on the Llama 3.2-3B architecture, specifically designed for high-efficiency edge and consumer GPU deployment. It features approximately 3.2 billion parameters and supports a substantial 32K token context length. A key differentiator is its training methodology, which includes Supervised Fine-Tuning (SFT) and knowledge distillation from over 100,000 high-quality GPT-5 responses, filtered for "normal" (flawless) conversational patterns. This distillation aims to imbue the smaller Llama model with the advanced reasoning and conversational style typically found in much larger models.

Key Capabilities

  • GPT-5 Distilled Logic: Mimics superior reasoning and conversational tone by learning from filtered GPT-5 outputs.
  • Edge-Optimized: Its 3.2B parameter count allows efficient operation on laptops, phones, and low-VRAM GPUs.
  • Long Context Window: Supports up to 32,768 tokens, suitable for processing moderate-sized documents in RAG applications.
  • GGUF Ready: Native GGUF support (Q4_K_M, Q8_0, FP16) for quantized deployment.
  • Instruction Following: Enhanced by a mix of ShareGPT-Qwen3 instructions and distilled GPT-5 responses.

Recommended Use Cases

  • On-Device Chat: Ideal for local execution due to its small footprint.
  • Reasoning & Explanations: Benefits from distilled GPT-5 logic for clearer and more structured answers.
  • Summarization & Rewriting: Strong capabilities in English and Chinese, inherited from its diverse training data.
  • RAG Applications: The 32K context window makes it suitable for retrieving and processing information from documents.