Bluebox85033/cogito-3b
Cogito-3B is a 3.1 billion parameter causal language model developed by Bluebox85033, fine-tuned from Qwen2.5-3B using Group Relative Policy Optimization (GRPO). It specializes in mathematical reasoning and problem-solving, particularly for Countdown puzzles and general math, by generating explicit reasoning traces before providing an answer. The model was trained using a two-stage curriculum with verifiable-correctness rewards, achieving significant improvements on Countdown and MATH-500 benchmarks.
Loading preview...
Overview
Cogito-3B is a 3.1 billion parameter model, fine-tuned from Qwen2.5-3B by Bluebox85033 using Group Relative Policy Optimization (GRPO). Its training exclusively utilized verifiable-correctness rewards, without reasoning demonstrations, learned reward models, or preference data. The model employs a two-stage curriculum, first on Countdown puzzles and then on general math problems, to enhance its reasoning capabilities.
Key Capabilities
- Enhanced Mathematical Reasoning: Demonstrates improved performance on mathematical tasks, particularly Countdown puzzles and general math problems.
- Explicit Reasoning Traces: Generates an explicit
<think> … </think>trace before producing a final<answer>, allowing for step-by-step reasoning inspection. - Verifiable Reward Training: Trained using a reinforcement learning from verifiable rewards (RLVR) setup, similar to DeepSeek-R1-Zero, ensuring correctness without relying on subjective preference data.
Performance Highlights
- Achieved a 64.1% solve rate on Countdown (up from 7.8% for the base model).
- Improved GSM8K score to 82.3% (from 81.0%).
- Increased MATH-500 score to 64.3% (from 55.0%).
Intended Use
Cogito-3B is designed as a base completion model, not a general-purpose instruction or chat model. It requires a specific prompt format for optimal performance, where the user's query is followed by Assistant: <think>\n to elicit the reasoning trace and final answer. It is particularly suited for applications requiring structured mathematical problem-solving and verifiable reasoning.