choco800/qwen3-4b-agent-v17

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Mar 1, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The choco800/qwen3-4b-agent-v17 is a 4 billion parameter model, fine-tuned from Qwen/Qwen3-4B-Instruct-2507, designed to enhance multi-turn agent task performance. It specializes in tasks requiring environment observation, action selection, tool use, and error recovery, particularly within environments like ALFWorld. This model is optimized for agentic workflows, learning from assistant turns in multi-turn trajectories. It features a 32768 token context length and is provided as a fully merged model, eliminating the need to load a separate base model.

Loading preview...

Model Overview

The choco800/qwen3-4b-agent-v17 is a 4 billion parameter language model, fine-tuned from the Qwen/Qwen3-4B-Instruct-2507 base model. This version is provided as a fully merged model, meaning it includes all weights and does not require loading a separate base model, simplifying deployment.

Key Capabilities

  • Enhanced Multi-Turn Agent Performance: Specifically trained to improve performance in complex, multi-turn agent tasks.
  • Agentic Trajectory Learning: Optimized for learning from assistant turns, covering environment observation, action selection, tool use, and error recovery.
  • ALFWorld Optimization: Demonstrates improved capabilities in household task environments like ALFWorld.
  • Efficient Training: Utilizes LoRA and Unsloth for efficient fine-tuning, with loss applied to all assistant turns in the trajectory.

Training Details

The model was trained for 1 epoch with a max sequence length of 8192, using a learning rate of 3e-06. It incorporates NEFTUNE noise for regularization and was trained on a combination of dbbench_sft_dataset_react datasets (v1-v4). Loss was specifically computed only on the assistant's responses, masking user prompts and observations.

Good For

  • Developing and evaluating AI agents for complex, multi-step tasks.
  • Applications requiring robust tool use and error recovery in simulated environments.
  • Research into agentic behavior and multi-turn interaction within LLMs.