Name: choco800/qwen3-4b-agent-v16 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: choco800

Qwen3-4B Agent Trajectory (v16) Overview

This model, choco800/qwen3-4b-agent-v16, is a 4 billion parameter language model derived from Qwen/Qwen3-4B-Instruct-2507. It has been fine-tuned using LoRA and Unsloth, with the merged weights provided directly, eliminating the need to load a separate base model.

Key Capabilities & Training Focus

The primary objective of this model's training was to significantly improve multi-turn agent task performance, specifically within the ALFWorld environment. The training methodology focused on:

Learning from full trajectories: Loss was applied to all assistant turns, enabling the model to learn from environment observations, action selections, tool usage, and error recovery mechanisms.
Response-only loss masking: During training, loss was computed exclusively on the assistant's responses, ensuring focused learning on generating appropriate actions and dialogue.

Training Details

The model was trained for 1 epoch with a maximum sequence length of 8192 tokens, utilizing a learning rate of 1e-05. The training data consisted of several versions of the sft_alfworld_trajectory_dataset from u-10bei, licensed under MIT. Users must also comply with the base model's Apache 2.0 license.

Ideal Use Cases

This model is particularly well-suited for applications requiring:

Agentic behavior: Developing AI agents that can perform complex, multi-step tasks.
Interactive environments: Scenarios where an agent needs to observe, act, and adapt over multiple turns.
Tool use and error recovery: Systems that benefit from an agent's ability to utilize tools and recover from mistakes in a structured environment.

Overview

Qwen3-4B Agent Trajectory (v16) Overview

Key Capabilities & Training Focus

Training Details

Ideal Use Cases

Full Model Card (README)