Name: yolay/SPEAR-SearchQA-Qwen2.5-7B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: yolay

Model Overview

yolay/SPEAR-SearchQA-Qwen2.5-7B is a 7.6 billion parameter agentic Large Language Model (LLM) developed by Yulei Qin and collaborators. It is built upon the Qwen2.5-7B-Instruct architecture and fine-tuned using the novel SPEAR (Self-imitation with Progressive Exploration for Agentic Reinforcement Learning) framework. SPEAR is a curriculum-based self-imitation learning (SIL) approach designed to train agentic LLMs on challenging long-horizon, sparse-reward tasks.

Key Capabilities & Training

Curriculum-based Self-Imitation Learning (SIL): SPEAR balances exploration and exploitation by initially using auxiliary tool-use rewards for broad skill exploration, then strengthening self-imitation to leverage successful replayed experiences.
Adaptive Training: The framework stabilizes training and improves efficiency by adaptively managing entropy and integrating both on-policy and off-policy data from a replay buffer.
Agentic Reinforcement Learning: Optimized for multi-turn tool interactions and episode-level reward computation, enabling effective exploration in sparsely rewarded environments.

Performance & Use Cases

This model demonstrates enhanced performance on various complex question answering (QA) benchmarks when integrated with the Dr.BoT method. For instance, with 550 training steps, SPEAR-SearchQA-Qwen2.5-7B achieves an average score of 45.4 across NQ, TriviaQA, PopQA, HotpotQA, 2Wiki, MuSiQue, and Bamboogle, outperforming baseline RL methods. It is particularly well-suited for applications requiring robust agentic behavior and effective problem-solving in environments with delayed or infrequent rewards.

Overview

Model Overview

Key Capabilities & Training

Performance & Use Cases

Full Model Card (README)