moushi21/agent-bench-alfworld-merged3

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 27, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The moushi21/agent-bench-alfworld-merged3 is a 4 billion parameter Qwen3-based instruction-tuned model, fine-tuned from Qwen/Qwen3-4B-Instruct-2507. This model is specifically specialized for ALFWorld trajectory tasks, designed to handle multi-turn environment observations and action selections. It features merged full bfloat16 weights for high-speed inference and easy deployment, making it ideal for agentic applications requiring precise environmental interaction.

Loading preview...

Overview

The moushi21/agent-bench-alfworld-merged3 is a 4 billion parameter model derived from Qwen/Qwen3-4B-Instruct-2507. Unlike LoRA adapters, this model integrates the fine-tuned weights directly into the base model using Unsloth's merge_and_unload method, resulting in a standalone, full-parameter model (bfloat16) optimized for efficient inference.

Key Capabilities

  • ALFWorld Specialization: Specifically fine-tuned for ALFWorld trajectory tasks, enabling it to process multi-turn environmental observations and select appropriate actions.
  • High-Speed Inference: Merged full weights ensure faster inference compared to models requiring separate LoRA loading.
  • Direct Deployment: Can be loaded and used like any standard Qwen3 model, simplifying integration into existing workflows.
  • Context Length: Trained with a maximum sequence length of 4096 tokens, suitable for complex multi-turn interactions.

Good For

  • Agentic AI Development: Ideal for researchers and developers working on AI agents that need to navigate and interact within simulated environments like ALFWorld.
  • Environmental Interaction Tasks: Excels in scenarios requiring sequential decision-making based on dynamic observations.
  • Efficient Deployment: Suitable for applications where fast and straightforward model deployment is critical, without the overhead of managing separate adapter weights.