Model Overview
choco800/qwen3-4b-agent-v24 is a 4 billion parameter language model, fully merged and fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using Unsloth. Unlike adapter repositories, this model contains the complete merged weights, eliminating the need to load a separate base model.
Key Capabilities
This model is specifically trained to enhance multi-turn agent task performance, particularly within environments like ALFWorld (household tasks). Its training objective focuses on enabling the model to:
- Learn environment observation: Understand and interpret the state of an interactive environment.
- Perform action selection: Choose appropriate actions based on observations and task goals.
- Utilize tools: Integrate and effectively use external tools within a task trajectory.
- Recover from errors: Adapt and correct its behavior in response to unexpected outcomes or failures.
Loss was applied to all assistant turns in the multi-turn trajectory, ensuring comprehensive learning across the entire interaction sequence.
Training Details
The model was trained for 1 epoch with a maximum sequence length of 8192, using a learning rate of 7e-06. It leveraged LoRA (r=8, alpha=16) and incorporated techniques like NEFTUNE_NOISE_ALPHA=5.0 to improve training stability and performance. The training data primarily consisted of ALFWorld trajectory datasets (v3, v4, v5) from u-10bei, with loss masking applied only to the assistant's responses.
Good For
- Developing AI agents for interactive, multi-step tasks.
- Applications requiring robust tool use and error recovery in simulated or real-world environments.
- Research into agentic LLMs and their performance in complex task execution.