Name: Jarrodbarnes/qwen3-0.6B-interleaved-thinking API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Jarrodbarnes

What is Jarrodbarnes/qwen3-0.6B-interleaved-thinking?

This is a small, experimental 0.8 billion parameter research model based on Qwen/Qwen3-0.6B-Base. Its primary purpose is to investigate whether a base model can develop an "interleaved thought" interface during pretraining and mid-training, making this interface rewardable. It's not an instruction-tuned assistant but a research artifact for studying novel training methodologies.

Key Training & Findings

The model's development involved a unique training pipeline:

Self-improving continued pretraining: Selected for judged-better continuations using an Online DPO-style approach.
Interleaved-thinking SFT: Taught the model to integrate short, local thoughts within ordinary text.
Reinforcement Learning Mid-Training (RLMT): Rewarded thought-conditioned suffix prediction, demonstrating that the thought interface became behaviorally relevant.

Key findings indicate that continued pretraining improved judged continuation quality, SFT successfully installed the thought interface (reducing thought-token NLL), and RLMT made this interface rewardable. Causal thought-use probes confirmed that thought text was not merely formatting, as swapped unrelated thoughts sharply reduced suffix reward.

When to Use This Model

This model is particularly useful for:

Conducting small-scale thinking mid-training experiments.
Performing causal thought-use probes.
Studying the mechanics of self-improving pretraining, interleaved SFT, and RLMT.
Reproducing the associated research blog results.

Limitations

It's important to note that this is a 0.6B parameter model with a short 200-step RLMT budget. It is a research artifact, not a production-ready assistant. Generated thoughts are not consistently superior to generic scaffolds, and downstream reasoning improvements were mixed. Claims should be limited to the documented small-scale experimental setup.

Overview

What is Jarrodbarnes/qwen3-0.6B-interleaved-thinking?

Key Training & Findings

When to Use This Model

Limitations

Full Model Card (README)