Name: yolay/SPEAR-ALFWorld-DrBoT-GiGPO-1.5B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: yolay

SPEAR-ALFWorld-DrBoT-GiGPO-1.5B Overview

This model, developed by Yulei Qin and collaborators, is a 1.5 billion parameter agentic LLM based on the Qwen2.5-1.5B-Instruct architecture, fine-tuned using the SPEAR (Self-imitation with Progressive Exploration for Agentic Reinforcement Learning) framework. SPEAR is specifically designed to enhance the training of LLM agents on complex, long-horizon tasks with sparse rewards by balancing exploration and exploitation.

Key Capabilities

Curriculum-based Self-Imitation Learning: Employs an adaptive curriculum that first uses auxiliary tool-use rewards for broad skill exploration, then strengthens self-imitation to leverage successful past trajectories.
Stable Training: Achieves a stable balance between exploration and exploitation, mitigating over-uncertainty in decision-making under shifting external feedback.
Enhanced Performance on Agentic Tasks: Demonstrates significant improvements over vanilla GRPO, Dr.BoT, and GiGPO methods on benchmarks like ALFWorld and WebShop.
- On ALFWorld, SPEAR improved GRPO by +16.1 and Dr.BoT(GiGPO) by +2.6, reaching 93.2%.
- On WebShop, SPEAR improved GRPO by +20.7 and Dr.BoT(GiGPO) by +8.1, reaching 81.1%.

Good for

Developing LLM agents for environments requiring long-horizon planning.
Tasks with sparse reward signals where effective exploration and exploitation are critical.
Research and application in agentic reinforcement learning and tool-use scenarios for LLMs.