invincible-jha/SynLogic-32B

TEXT GENERATIONConcurrency Cost:2Model Size:32.8BQuant:FP8Ctx Length:32kPublished:Apr 16, 2026License:mitArchitecture:Transformer Open Weights Cold

SynLogic-32B is a 32.8 billion parameter reasoning model developed by invincible-jha, built upon the Qwen2.5-32B-Base architecture. Trained using reinforcement learning on the SynLogic dataset, it excels at complex logical reasoning tasks, including Sudoku and Game of 24. This model demonstrates strong generalization capabilities to mathematical problem-solving, achieving state-of-the-art performance on the BBEH benchmark among open-source logical reasoning models.

Loading preview...

SynLogic-32B: Advanced Logical Reasoning Model

SynLogic-32B, developed by invincible-jha, is a 32.8 billion parameter model based on Qwen2.5-32B-Base, specifically fine-tuned for advanced logical reasoning. It leverages a novel reinforcement learning approach on the comprehensive SynLogic dataset, which includes 35 diverse logical reasoning tasks such as Sudoku, Game of 24, Cipher, and Arrow Maze. A key innovation is the verifiability of all training data, enabling highly effective reinforcement learning through binary rewards based on format adherence and correctness.

Key Capabilities

  • Comprehensive Logical Reasoning: Proficient in a wide array of logical puzzles and challenges.
  • Strong Generalization: Demonstrates the ability to transfer learned logical reasoning skills to mathematical problem-solving without explicit mathematical training.
  • Verifiable Training: Utilizes a unique dataset where all training samples can be automatically verified, enhancing model reliability and performance.

Performance Highlights

SynLogic-32B achieves a notable +6 point improvement over DeepSeek-R1-Distill-Qwen-32B on the challenging BBEH benchmark, establishing it as a leading open-source model for logical reasoning. While excelling in BBEH, it also maintains competitive performance on KOR-Bench and BBH. The model was trained using the GRPO (Group Relative Policy Optimization) algorithm on 33,000 SynLogic-Hard samples with controlled difficulty.

Good for

  • Applications requiring robust logical deduction and problem-solving.
  • Tasks involving complex puzzles and reasoning challenges.
  • Research into advanced reasoning capabilities and generalization in LLMs.