Klingspor/Qwen3-1.7B-SFT

Warm
Public
2B
BF16
32768
May 12, 2025
License: apache-2.0
Hugging Face
Overview

Overview

Klingspor/Qwen3-1.7B-SFT is a specialized supervised fine-tuned (SFT) model based on Qwen3-1.7B. Its primary function is to act as a Questioner in the game of 20 Questions, where it asks up to 20 yes-or-no questions to identify a secret common English noun. This model was developed as part of the paper "Intrinsic Credit Assignment for Long Horizon Interaction" and serves as an initialization point for subsequent reinforcement learning (RL) models like StarPO and CIA.

Key Capabilities

  • Strategic Questioning: Designed to formulate clear, concise, and strategic yes/no questions to narrow down possibilities in a deductive game.
  • Multi-turn Interaction: Optimized for sequential, interactive dialogue where previous answers inform subsequent queries.
  • RL Initialization: Functions as a strong starting checkpoint for further reinforcement learning training in interactive agent scenarios.

Training Details

The model was fine-tuned on successful, filtered single-turn trajectories using a dataset of 341 words from the COCA word list. A Qwen3-14B model acted as the Judge/Oracle during training. The training data specifically excluded words used in RL training or test sets to ensure robust evaluation.

Intended Use Cases

  • Playing 20 Questions: Directly usable as an AI agent to play the 20 Questions game.
  • Research on Interactive Agents: Ideal for researchers exploring multi-turn interactive language agents and intrinsic credit assignment.
  • RL Base Model: Provides a solid foundation for developing and training more advanced reinforcement learning models for long-horizon interaction tasks.

For more details, refer to the research paper and the associated codebase.