Klingspor/StarPO-1.7B is a 1.7 billion parameter Qwen3-based language model, fine-tuned using StarPO (a GRPO variant for multi-turn settings). Developed as part of the paper "Intrinsic Credit Assignment for Long Horizon Interaction," this model is specifically designed to act as a Questioner in the 20 Questions game. It excels at asking strategic yes-or-no questions to deduce a secret word, making it ideal for research into multi-turn interactive language agents and reinforcement learning for LLMs.
No reviews yet. Be the first to review!