tuandunghcmut/qwen3-4b-planner-v1

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:May 20, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

tuandunghcmut/qwen3-4b-planner-v1 is a 4 billion parameter LoRA fine-tune of Qwen/Qwen3-4B-Instruct-2507, specifically optimized for multi-agent planning and tool-calling. This model excels at generating structured JSON plans and MAS orchestrator YAML, supporting native Hermes-style tool call outputs. It is designed for applications requiring robust, structured planning capabilities from a compact language model.

Loading preview...

Overview

tuandunghcmut/qwen3-4b-planner-v1 is a 4 billion parameter model, fine-tuned using LoRA on the Qwen/Qwen3-4B-Instruct-2507 base model. This model is specialized in multi-agent planning and tool-calling, distinguishing itself through its ability to generate structured outputs for complex task orchestration.

Key Capabilities

  • Structured Planning Output: Generates structured JSON plans (e.g., {"tasks": [...]}) for short-prompt planning scenarios.
  • MAS Orchestration: Produces MAS orchestrator YAML (action: plan|clarify|done) for canonical long-prompt planning styles.
  • Native Tool-Calling: Supports Hermes-style <tool_call> output, with arguments as direct JSON objects, avoiding double-encoding issues.
  • Training Data: Fine-tuned on a diverse 500k-row dataset including multiformat, toucan_qwen3, nemotron_chat_if, planner_v01_full, duy_vhb_style_json, and nemotron_structured_outputs.
  • Accessibility: Available as a merged Hugging Face model, a LoRA adapter for integration with the base model, and various GGUF quantizations (f16, q8_0, q4_k_m) for flexible deployment.

Good For

  • Agentic Workflows: Ideal for developing applications that require LLMs to break down user requests into actionable, structured plans for multiple agents.
  • Tool Use Integration: Excellent for scenarios where precise, natively formatted tool calls are crucial for interacting with external systems or APIs.
  • Resource-Constrained Environments: With its 4B parameters and GGUF quantizations, it's suitable for deployment on edge devices or systems with limited computational resources, while still providing specialized planning capabilities.