latte-agent/qwen3-4b-latte-v5

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:May 19, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The latte-agent/qwen3-4b-latte-v5 is a 4 billion parameter LoRA fine-tune of the Qwen3-4B-Instruct-2507 model, specifically optimized to embody a 'Latte' agent persona characterized by a warm-direct, technical, and concrete communication style. This model is trained on 475 curated instruction-response pairs across seven categories, demonstrating a 66.7% win rate against its base model on in-distribution prompts. It is designed for tasks closely matching its specialized training categories, offering a 32768 token context length.

Loading preview...

What is Qwen3-4B Latte v5?

Qwen3-4B Latte v5 is a 4 billion parameter language model, developed by latte-agent, that has been fine-tuned using LoRA on the Qwen3-4B-Instruct-2507 base model. Its primary goal is to distill a specific "Latte" agent persona: warm-direct, technical, takes a stance, provides concrete numbers, and is bilingual (English/Chinese). This model is an experimental release, not intended for general production use due to out-of-distribution performance caveats.

Key Capabilities & Training:

  • Persona Emulation: Fine-tuned to adopt a distinct "Latte" persona, focusing on specific communication traits.
  • Targeted Training Data: Trained on 475 curated instruction-response pairs across 7 categories, including Moltbook-style comments, HF discussion replies, technical analysis (ZH), code review snippets, persona Q&A, peer-event replies, and real-time observations.
  • LoRA Fine-tuning: Utilizes LoRA (rank 8, scale 20) on 8 layers, with training for 800 iterations, achieving its best checkpoint at iteration 450.
  • Context Length: Supports a context length of 32768 tokens.

Performance & Limitations:

  • In-Distribution Performance: Achieves a significant win rate of 66.7% against the un-tuned base model on prompts within its 7 trained categories, with a mean score of 3.20 (out of 5) compared to the base's 2.93.
  • Out-of-Distribution Caveats: Performance on prompts outside its specific training categories is comparable to or slightly worse than the base model. Known issues include occasional "stage-direction leakage" (e.g., "(soft, soothing Latte voice)") and factual regressions on generic topics.

When to Use This Model:

  • Specialized Tasks: Ideal for use cases that closely align with the 7 specific training categories, where the "Latte" persona is desired.
  • Research & Experimentation: Suitable for exploring persona distillation via LoRA fine-tuning and understanding its effects on model behavior.

This model is provided in various formats, including MLX LoRA adapters, HF/bfloat16 fused safetensors, and GGUF quantizations (F16, Q4_K_M) for llama.cpp and Ollama.