Name: VikramR/cypherbench-grpo-3 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: VikramR

Model Overview

VikramR/cypherbench-grpo-3 is a 5.1 billion parameter language model, fine-tuned from the google/gemma-4-E2B-it base model. Its development utilized the TRL framework, a library for training transformer models with reinforcement learning.

Key Capabilities

Enhanced Mathematical Reasoning: The model was specifically trained using the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This training approach aims to significantly improve its performance on complex mathematical and logical reasoning tasks.
Instruction Following: As a fine-tuned instruction model, it is designed to respond effectively to user prompts and follow given instructions.

Training Details

The training procedure leveraged GRPO, a technique focused on optimizing models for mathematical problem-solving. The model was developed using TRL version 1.5.1, Transformers 5.10.2, Pytorch 2.11.0+cu129, Datasets 4.8.5, and Tokenizers 0.22.2.

Good For

Applications requiring strong mathematical reasoning.
Tasks involving complex problem-solving and logical deduction.
Research and development in advanced language model training techniques, particularly GRPO.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)