Name: choco800/qwen3-4b-agent-v27 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: choco800

Qwen3-4B Agent Trajectory (v27) Overview

This model, choco800/qwen3-4b-agent-v27, is a fully merged 4 billion parameter model fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using Unsloth. Unlike adapter repositories, it provides merged weights, eliminating the need to load a separate base model.

Key Capabilities & Training Focus

The primary objective of this model's training was to significantly enhance multi-turn agent task performance, specifically within environments like ALFWorld (household tasks). The training methodology applied loss to all assistant turns in a multi-turn trajectory, enabling the model to learn and improve across several critical agentic functions:

Environment observation: Interpreting and understanding the state of its surroundings.
Action selection: Choosing appropriate actions based on observations and goals.
Tool use: Effectively utilizing available tools to complete tasks.
Error recovery: Adapting and correcting its trajectory when encountering errors.

Training involved a maximum sequence length of 8192 tokens over 1 epoch, with loss computed only on the assistant's responses, masking user prompts and observations.

Datasets and Licensing

The model was trained using several versions of the sft_alfworld_trajectory_dataset (v3, v4, v5) from u-10bei, all distributed under the MIT License. Users must adhere to both the dataset licenses and the base model's original Apache 2.0 terms of use.

Overview

Qwen3-4B Agent Trajectory (v27) Overview

Key Capabilities & Training Focus

Datasets and Licensing

Full Model Card (README)