Name: Klingspor/Qwen3-4B-SFT API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Klingspor

Overview of Klingspor/Qwen3-4B-SFT

This model is a supervised fine-tuned (SFT) version of Qwen3-4B, developed as part of the research presented in the paper "Intrinsic Credit Assignment for Long Horizon Interaction." Its primary function is to act as a Questioner in the game of 20 Questions, where it asks up to 20 yes-or-no questions to identify a secret common English noun.

Key Capabilities & Features

Specialized for 20 Questions: Designed to strategically ask deductive questions in a multi-turn interactive setting.
Reinforcement Learning Initialization: Serves as a crucial starting checkpoint for further reinforcement learning (RL) models, such as StarPO and CIA, focusing on long-horizon interaction.
Interactive Agent Research: Intended for research into multi-turn interactive language agents, providing a foundation for complex dialogue systems.

Training Details

The model was fine-tuned using a supervised approach on successful, filtered single-turn trajectories. The training data comprised 341 words from the COCA word list, ensuring no overlap with RL training or test sets. A Qwen3-14B model acted as the judge/oracle during this process.

Intended Use Cases

Playing 20 Questions: Directly usable as an agent to play the 20 Questions game.
RL Training Base: Ideal as an initial checkpoint for developing and training advanced RL-based interactive agents.
Research: Supports research into interactive language models and intrinsic credit assignment.

For more technical details, refer to the Intrinsic Credit Assignment for Long Horizon Interaction paper and the associated GitHub repository.

Overview

Overview of Klingspor/Qwen3-4B-SFT

Key Capabilities & Features

Training Details

Intended Use Cases

Full Model Card (README)