cyberagent/CAT-Thinking-8B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:May 28, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

cyberagent/CAT-Thinking-8B is an 8 billion parameter language model developed by CyberAgent, based on the Qwen3-Swallow-v0.2 architecture with a 32K context length. This model is uniquely trained to generate reasoning traces specifically in Japanese, even when processing English inputs. It excels in Japanese reasoning tasks, particularly in coding and mathematical problem-solving, while maintaining performance on English tasks.

Loading preview...

CAT-Thinking-8B: Japanese Reasoning Model

CAT-Thinking-8B, developed by CyberAgent, is an 8 billion parameter language model built upon the Qwen3-Swallow-v0.2 architecture, which itself is a Japanese-optimized continual pretraining of Qwen3. Its primary distinction lies in its ability to "think in Japanese", generating detailed reasoning traces in Japanese even when given English prompts. This capability is achieved through a sophisticated training procedure involving GRPO with a warm-start, utilizing a teacher dataset from gpt-oss-120b translated into Japanese.

Key Capabilities & Features

  • Japanese Reasoning: Uniquely trained to produce reasoning steps in Japanese for problem-solving.
  • Multilingual Input: Can process instructions in English and generate Japanese reasoning.
  • Problem Solving: Evaluated on coding (mbpp, HumanEval, JHumanEval, LiveCodeBenchv6) and math tasks (GPQA, PolyMath, AIME 26) in both Japanese and English.
  • Context Length: Supports a context length of 32,768 tokens.
  • Output Length: Designed for a maximum output token length of 4096, with recommendations for larger max_new_tokens for complex problems.
  • Repetition Mitigation: Suggests repetition_penalty=1.05 or higher to reduce repetitive outputs.

Training & Performance Notes

The model's training involved a multi-stage GRPO process, initially focusing on format adherence (Japanese reasoning trace, Japanese main text, instructed format) with a permissive reward model, followed by a strict reward model for correct answers. While optimized for Japanese reasoning, the training data predominantly consisted of English instructions, which may lead to underperformance on some Japanese benchmarks. The model's reasoning trace, though in Japanese, may exhibit specific learned phrases from its GRPO training.