JoshXT/AGiXT-Qwen3-VL-4B

VISIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Jan 31, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The JoshXT/AGiXT-Qwen3-VL-4B is a 4 billion parameter vision-language model, fine-tuned from Qwen3-VL-4B-Instruct, specifically designed for AGiXT agent interactions. It excels at understanding and generating AGiXT's XML-based command syntax, tool delegation patterns, and multi-step reasoning for complex agent workflows. This model integrates vision capabilities to analyze screenshots and images, supporting context-aware responses and web automation tasks within the AGiXT ecosystem.

Loading preview...

AGiXT-Qwen3-VL-4B: Vision-Language Model for AGiXT Agents

This 4 billion parameter vision-language model, fine-tuned from Qwen3-VL-4B-Instruct, is purpose-built to enhance AGiXT agent interactions. It was trained on a specialized Agent Interaction Dataset (936 examples) to deeply understand AGiXT's unique operational patterns.

Key Capabilities

  • Native AGiXT Command Understanding: Processes and generates AGiXT's XML-based command execution format, including <execute>, <thinking>, and <answer> tags, with proper parameter formatting for over 778 AGiXT commands.
  • Intelligent Tool Delegation: Learns when to delegate coding tasks to GitHub Copilot versus utilizing other AGiXT extensions like web_browsing or postgres_database.
  • Multi-Step Reasoning: Supports complex agent workflows by maintaining context and executing multi-step reasoning patterns.
  • Vision Integration: Capable of analyzing screenshots for UI state understanding in web automation and processing images shared in conversations for context-aware responses, supporting the View Image command.

How it Fits into AGiXT

This model is part of an integrated system within AGiXT, often working in conjunction with the smaller AGiXT-AbilitySelect-270m model. The AbilitySelect model acts as a fast router, determining task complexity and the most appropriate ability, then routing requests to the AGiXT-Qwen3-VL-4B for moderate to complex tasks (scores 26-75). This ensures efficient resource utilization and optimized response times by using the right-sized model for each task.

Good For

  • Developers building AGiXT agents requiring robust command execution and structured responses.
  • Applications needing vision capabilities for web automation, image analysis, and context-aware interactions within an agentic framework.
  • Scenarios where precise tool use and multi-step reasoning are critical for agent performance.