Model Overview
The joeyzero/Qwen3-4B-Reasoning-Backfill-v0.1 is an experimental 4 billion parameter model, fine-tuned from the Qwen/Qwen3-4B base model. Its core function is to reconstruct plausible chains of reasoning that bridge a given INSTRUCTION to a predetermined SOLUTION. Unlike typical generative models, this model focuses exclusively on the process or "thinking" trace, ensuring the original solution remains unchanged. This capability is crucial for reasoning backfill, allowing the generation of detailed thought processes for datasets where such information is not readily available, such as legacy chat logs or instruction corpora.
Key Capabilities
- Reasoning Backfill: Generates stepwise reasoning traces (
<|thinking_start|> to <|thinking_end|>) that lead to a specified solution. - Dataset Augmentation: Can create process-supervision style traces for existing instruction-solution pairs.
- Teacher Bootstrapping: Provides trace-rich examples to train or distill teacher models.
- Auditability: Produces rationales that can aid in auditing solution adherence and understanding model behavior.
Training Details
The model was trained on a single H100 GPU for 4 epochs, utilizing an adamw_bnb_8bit optimizer with a learning rate of 2.5e-5 and a cosine learning rate schedule with 40 warmup steps. It processes a sequence length of 1024 tokens and uses chatml for its chat template.
Limitations
- The generated traces are plausible reconstructions, not actual ground-truth cognitive processes.
- The model may over-rationalize if the provided solution is underspecified.
Recommended Usage
For optimal performance, recommended sampling parameters include temperature: 0.7–1.0, top_p: 0.9, and min_p: 0.05. For stricter adherence to the solution, a lower temperature (e.g., 0.5–0.7) is advised.