yolay/SPEAR-ALFWorld-DrBoT-GiGPO-1.5B
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Sep 27, 2025License:apache-2.0Architecture:Transformer Open Weights Warm
yolay/SPEAR-ALFWorld-DrBoT-GiGPO-1.5B is a 1.5 billion parameter model developed by Yulei Qin and collaborators, based on the Qwen2.5-1.5B-Instruct architecture. It implements the SPEAR (Self-imitation with Progressive Exploration for Agentic Reinforcement Learning) framework, designed for training agentic LLMs on long-horizon, sparse-reward tasks. This model excels at balancing exploration and exploitation through a curriculum-based self-imitation learning approach, achieving significant performance gains on benchmarks like ALFWorld and WebShop compared to baseline methods.
Loading preview...