OpenPipe/Deductive-Reasoning-Qwen-32B

Warm
Public
32.8B
FP8
32768
2
Mar 6, 2025
License: mit
Hugging Face

OpenPipe/Deductive-Reasoning-Qwen-32B is a 32.8 billion parameter language model based on Qwen 2.5 32B Instruct, fine-tuned using reinforcement learning by OpenPipe. This model is specifically optimized to solve complex deductive reasoning problems, leveraging the Temporal Clue dataset. With a context length of 131072 tokens, it excels at tasks requiring logical inference and problem-solving.

Overview

OpenPipe/Deductive-Reasoning-Qwen-32B Overview

OpenPipe/Deductive-Reasoning-Qwen-32B is a specialized 32.8 billion parameter language model developed by OpenPipe. It is a reinforcement learning fine-tune of the robust Qwen 2.5 32B Instruct base model. The primary objective of this fine-tuning was to enhance its capabilities in solving challenging deductive reasoning problems.

Key Capabilities

  • Deductive Reasoning: Specifically trained to excel at complex logical deduction tasks.
  • Reinforcement Learning Fine-tuning: Utilizes advanced reinforcement learning techniques, specifically GPO, to optimize performance on reasoning benchmarks.
  • Temporal Clue Dataset: Optimized using the Temporal Clue dataset, a collection of challenging deduction problems.
  • Multilingual Support: Inherits multilingual capabilities from its Qwen base, supporting languages like Chinese, English, French, Spanish, German, and more.

When to Use This Model

This model is particularly well-suited for applications requiring strong logical inference and problem-solving abilities. Consider using it for:

  • Complex Reasoning Tasks: Ideal for scenarios where models need to deduce conclusions from given premises.
  • Research in RL for Reasoning: A valuable resource for researchers exploring reinforcement learning applications in enhancing LLM reasoning.
  • Benchmarking Deductive Skills: Can serve as a strong baseline or comparison point for evaluating deductive reasoning performance.