Name: odats/rl_nmt_2026_04_07_08_22 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: odats

Overview

odats/rl_nmt_2026_04_07_08_22 is a 1 billion parameter language model, fine-tuned from the google/gemma-3-1b-it base model. It leverages the TRL (Transformers Reinforcement Learning) framework for its training process.

Key Capabilities

Enhanced Mathematical Reasoning: This model was specifically trained using GRPO (Gradient-based Reinforcement Learning with Policy Optimization), a method introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This indicates a focus on improving the model's ability to handle complex mathematical problems and logical reasoning tasks.
Instruction Following: As a fine-tuned version of an instruction-tuned model (gemma-3-1b-it), it is designed to follow user instructions effectively.

Training Details

Base Model: google/gemma-3-1b-it
Training Framework: TRL (version 1.0.0)
Training Method: GRPO, as detailed in the DeepSeekMath research.

Good For

Applications requiring improved mathematical problem-solving.
Tasks benefiting from enhanced logical reasoning capabilities.
Instruction-following scenarios where the base Gemma model's strengths are desired with an added focus on reasoning.

Overview

Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)