Name: miaolu3/qwen3-8b-alfworld-rl-step570 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: miaolu3

Overview

This model, miaolu3/qwen3-8b-alfworld-rl-step570, is an 8 billion parameter variant of the Qwen3 architecture, specifically fine-tuned using reinforcement learning (RL) for the ALFWorld text-world benchmark. It represents a snapshot at training step 570, demonstrating specialized capabilities in navigating and interacting within complex text-based environments.

Key Capabilities

ALFWorld Task Performance: Achieves a high validation success rate of ~0.957 on valid_seen ALFWorld tasks, indicating strong performance in text-based reasoning and action selection.
Reinforcement Learning Fine-tuning: Leverages RL to optimize its decision-making process within interactive text environments.
Structured Inference: Utilizes the Qwen3 chat template with enable_thinking=True, allowing the model to explicitly reason within <think>...</think> tags and output chosen actions within <action>...</action> tags, based on observations and admissible actions.
Base Model: Built upon the robust Qwen/Qwen3-8B foundation, inheriting its tokenizer and general language understanding.

Intended Use Cases

Embodied AI Research: Serves as a strong baseline for research in ALFWorld policy development and other text-based interactive AI tasks.
Distillation Source: Suitable as a source model for distilling knowledge into smaller student models, such as Qwen3-0.6B or Qwen2.5-0.5B, for more efficient deployment.
Agent Development: Ideal for developers working on agents that need to understand, reason, and act within text-based game or simulation environments.

Overview

Overview

Key Capabilities

Intended Use Cases

Full Model Card (README)