Bluebox85033/cogito-3b

TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Jul 1, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Cogito-3B is a 3.1 billion parameter causal language model developed by Bluebox85033, fine-tuned from Qwen2.5-3B using Group Relative Policy Optimization (GRPO). It specializes in mathematical reasoning and problem-solving, particularly for Countdown puzzles and general math, by generating explicit reasoning traces before providing an answer. The model was trained using a two-stage curriculum with verifiable-correctness rewards, achieving significant improvements on Countdown and MATH-500 benchmarks.

Loading preview...

Overview

Cogito-3B is a 3.1 billion parameter model, fine-tuned from Qwen2.5-3B by Bluebox85033 using Group Relative Policy Optimization (GRPO). Its training exclusively utilized verifiable-correctness rewards, without reasoning demonstrations, learned reward models, or preference data. The model employs a two-stage curriculum, first on Countdown puzzles and then on general math problems, to enhance its reasoning capabilities.

Key Capabilities

  • Enhanced Mathematical Reasoning: Demonstrates improved performance on mathematical tasks, particularly Countdown puzzles and general math problems.
  • Explicit Reasoning Traces: Generates an explicit <think> … </think> trace before producing a final <answer>, allowing for step-by-step reasoning inspection.
  • Verifiable Reward Training: Trained using a reinforcement learning from verifiable rewards (RLVR) setup, similar to DeepSeek-R1-Zero, ensuring correctness without relying on subjective preference data.

Performance Highlights

  • Achieved a 64.1% solve rate on Countdown (up from 7.8% for the base model).
  • Improved GSM8K score to 82.3% (from 81.0%).
  • Increased MATH-500 score to 64.3% (from 55.0%).

Intended Use

Cogito-3B is designed as a base completion model, not a general-purpose instruction or chat model. It requires a specific prompt format for optimal performance, where the user's query is followed by Assistant: <think>\n to elicit the reasoning trace and final answer. It is particularly suited for applications requiring structured mathematical problem-solving and verifiable reasoning.