Name: Klingspor/StarPO-1.7B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Klingspor

Overview

Klingspor/StarPO-1.7B is a specialized 1.7 billion parameter language model, a reinforcement learning (RL) fine-tuned version of Qwen3-1.7B. Its primary function is to serve as a Questioner in the classic 20 Questions game, where it asks strategic yes-or-no questions to identify a secret common English noun. This model was developed and released as a baseline for the paper "Intrinsic Credit Assignment for Long Horizon Interaction."

Key Capabilities

20 Questions Game Agent: Designed to play the role of the questioner, formulating deductive questions.
Multi-turn Interaction: Optimized for sequential, interactive dialogue through its StarPO training.
Research Baseline: Serves as a comparative model for studies on intrinsic credit assignment in multi-step RL and interactive language agents.

Training Details

The model was trained using StarPO, a variant of Group Relative Policy Optimization (GRPO) adapted for multi-turn scenarios. It started from a Qwen3-1.7B SFT checkpoint and utilized 1,000 words from the COCA+ RL training set. A Qwen3-14B model with chain-of-thought reasoning acted as the judge/oracle during training, which was conducted using the VERL framework.

Intended Use Cases

Playing 20 Questions: Directly usable as an agent for the 20 Questions game.
RL Research: Ideal for research into multi-turn interactive language agents and the application of RL to LLMs.
Credit Assignment Studies: Provides a valuable baseline for comparing different credit assignment methods in multi-step reinforcement learning.