Name: choco800/qwen3-4b-agent-v4 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: choco800

Overview

This model, choco800/qwen3-4b-agent-v4, is a 4 billion parameter language model based on Qwen/Qwen3-4B-Instruct-2507. It has been fine-tuned using Unsloth, resulting in a fully merged model that does not require loading a separate base model. The primary objective of its training was to significantly enhance multi-turn agent task performance.

Key Capabilities

Multi-turn Agent Trajectory Learning: The model is trained to improve performance across entire multi-turn agent trajectories, applying loss to all assistant turns.
Environment Interaction: It learns to process environment observations and make appropriate action selections.
Tool Use: The model is capable of integrating and utilizing tools within its operational framework.
Error Recovery: A key focus of its training includes the ability to recover from errors encountered during task execution.
Specialized Task Domains: Demonstrated proficiency in tasks related to ALFWorld (household tasks) and DBBench (database operations).

Training Details

The model was trained for 1 epoch with a maximum sequence length of 8192 tokens. Training involved LoRA with r=16 and alpha=32, and loss was computed exclusively on the assistant's responses, masking user prompts and observations. The training data utilized includes u-10bei/sft_alfworld_trajectory_dataset_v3 and u-10bei/dbbench_sft_dataset_react_v4, both distributed under the MIT License.

Overview

Overview

Key Capabilities

Training Details

Full Model Card (README)