minpeter/Qwen3-0.6B-Instruct
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.8BQuant:BF16Ctx Length:32kPublished:Jan 22, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

minpeter/Qwen3-0.6B-Instruct is an 0.8 billion parameter instruction-tuned language model, forked from Qwen/Qwen3-0.6B. This model has been specifically modified to serve as a training target within the PrimeIntellect-ai/verifiers framework. Its primary differentiation lies in its adjusted chat template and reversed thinking tag logic, enabling a default 'thinking mode' for specialized training applications. It is best suited for developers integrating it into specific verification and training pipelines.

Loading preview...

Model Overview

minpeter/Qwen3-0.6B-Instruct is an 0.8 billion parameter language model, derived from the original Qwen/Qwen3-0.6B base model. This particular version has been adapted with specific modifications to function as a training target within the PrimeIntellect-ai/verifiers framework.

Key Modifications

This model distinguishes itself through two primary changes from its original Qwen3-0.6B counterpart:

  • Chat Template Extraction: The chat_template has been moved from tokenizer_config.json into a dedicated chat_template.jinja file, aligning with the latest transformers library format.
  • Reversed Thinking Tag Logic: The logic for thinking tags has been inverted, which means the model's 'thinking mode' is enabled by default (enable_thinking=True). This adjustment is crucial for its intended use in specific training and verification scenarios.

Intended Use Cases

This modified Qwen3-0.6B-Instruct is specifically designed for:

  • Integration with PrimeIntellect-ai/verifiers: Its primary purpose is to serve as a target model within this verification framework.
  • Specialized Training Environments: Developers requiring a model with a default 'thinking mode' for particular training or evaluation setups will find this version suitable.

For comprehensive details regarding the original model's architecture, performance benchmarks, and general usage, users should refer to the documentation of the base Qwen/Qwen3-0.6B model.