fn-aka-mur/qw3-4b-v17-gs180
The fn-aka-mur/qw3-4b-v17-gs180 model is a 4 billion parameter causal language model, fine-tuned from Qwen/Qwen3-4B-Instruct-2507 by fn-aka-mur. It utilizes Agentic Reinforcement Learning to enhance multi-turn agent task performance, specifically excelling in household tasks (ALFWorld) and database operations (DBBench). This model is optimized for complex, multi-step agentic workflows, offering improved reliability in automated task execution.
Loading preview...
Model Overview
The fn-aka-mur/qw3-4b-v17-gs180 model is a 4 billion parameter instruction-tuned language model, developed by fn-aka-mur. It is fine-tuned from the Qwen/Qwen3-4B-Instruct-2507 base model using Agentic Reinforcement Learning (RL).
Key Capabilities
- Enhanced Agentic Performance: Specifically trained to improve performance on multi-turn agent tasks.
- Task Domains: Demonstrates proficiency in:
- ALFWorld: Complex household task execution.
- DBBench: Database operation tasks.
- Training Method: Leverages Agentic Reinforcement Learning, indicating a focus on sequential decision-making and planning within environments.
- Training Configuration: Utilized a maximum sequence length of 8192 tokens and a learning rate of 1e-06 during its RL fine-tuning process.
Good For
- Automated Agents: Ideal for developing AI agents that need to perform multi-step tasks in simulated or real-world environments.
- Complex Workflow Automation: Suitable for applications requiring an LLM to interact with tools or environments over multiple turns to achieve a goal.
- Research in Agentic AI: Provides a specialized model for exploring and developing agentic capabilities, particularly in household and database interaction contexts.
Licensing
The model's training data (u-10bei/sft_alfworld_trajectory_dataset_v5, u-10bei/dbbench_sft_dataset_react_v4) is distributed under the MIT License. Users must comply with both the MIT License and the original terms of use for the base Qwen model.