Name: Bluebox85033/cogito-3b API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Bluebox85033

Overview

Cogito-3B is a 3.1 billion parameter model, fine-tuned from Qwen2.5-3B by Bluebox85033 using Group Relative Policy Optimization (GRPO). Its training exclusively utilized verifiable-correctness rewards, without reasoning demonstrations, learned reward models, or preference data. The model employs a two-stage curriculum, first on Countdown puzzles and then on general math problems, to enhance its reasoning capabilities.

Key Capabilities

Enhanced Mathematical Reasoning: Demonstrates improved performance on mathematical tasks, particularly Countdown puzzles and general math problems.
Explicit Reasoning Traces: Generates an explicit <think> … </think> trace before producing a final <answer>, allowing for step-by-step reasoning inspection.
Verifiable Reward Training: Trained using a reinforcement learning from verifiable rewards (RLVR) setup, similar to DeepSeek-R1-Zero, ensuring correctness without relying on subjective preference data.

Performance Highlights

Achieved a 64.1% solve rate on Countdown (up from 7.8% for the base model).
Improved GSM8K score to 82.3% (from 81.0%).
Increased MATH-500 score to 64.3% (from 55.0%).

Intended Use

Cogito-3B is designed as a base completion model, not a general-purpose instruction or chat model. It requires a specific prompt format for optimal performance, where the user's query is followed by Assistant: <think>\n to elicit the reasoning trace and final answer. It is particularly suited for applications requiring structured mathematical problem-solving and verifiable reasoning.

Overview

Overview

Key Capabilities

Performance Highlights

Intended Use

Full Model Card (README)