miaolu3/qwen3-8b-alfworld-rl-step570
The miaolu3/qwen3-8b-alfworld-rl-step570 model is an 8 billion parameter Qwen3-based language model fine-tuned with reinforcement learning specifically for the ALFWorld text-world benchmark. Developed by miaolu3, this model excels at navigating and reasoning within text-based environments, demonstrating a validation success rate of approximately 95.7% on valid_seen ALFWorld tasks. It is designed to reason and emit actions within a structured chat template, making it suitable for embodied AI research and as a strong baseline for ALFWorld policy development.
Loading preview...
Overview
This model, miaolu3/qwen3-8b-alfworld-rl-step570, is an 8 billion parameter variant of the Qwen3 architecture, specifically fine-tuned using reinforcement learning (RL) for the ALFWorld text-world benchmark. It represents a snapshot at training step 570, demonstrating specialized capabilities in navigating and interacting within complex text-based environments.
Key Capabilities
- ALFWorld Task Performance: Achieves a high validation success rate of ~0.957 on
valid_seenALFWorld tasks, indicating strong performance in text-based reasoning and action selection. - Reinforcement Learning Fine-tuning: Leverages RL to optimize its decision-making process within interactive text environments.
- Structured Inference: Utilizes the Qwen3 chat template with
enable_thinking=True, allowing the model to explicitly reason within<think>...</think>tags and output chosen actions within<action>...</action>tags, based on observations and admissible actions. - Base Model: Built upon the robust Qwen/Qwen3-8B foundation, inheriting its tokenizer and general language understanding.
Intended Use Cases
- Embodied AI Research: Serves as a strong baseline for research in ALFWorld policy development and other text-based interactive AI tasks.
- Distillation Source: Suitable as a source model for distilling knowledge into smaller student models, such as Qwen3-0.6B or Qwen2.5-0.5B, for more efficient deployment.
- Agent Development: Ideal for developers working on agents that need to understand, reason, and act within text-based game or simulation environments.