Name: odats/rl_nmt_2026_04_06_16_19 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: odats

Model Overview

odats/rl_nmt_2026_04_06_16_19 is a 1 billion parameter instruction-tuned language model, building upon the google/gemma-3-1b-it architecture. This model distinguishes itself through its training methodology, which leverages the TRL library and specifically incorporates the GRPO (Gradient-based Reward Policy Optimization) method. GRPO, introduced in the context of the DeepSeekMath project, is designed to push the limits of mathematical reasoning in open language models.

Key Capabilities

Enhanced Reasoning: Fine-tuned with a method aimed at improving mathematical and general reasoning abilities.
Instruction Following: Inherits strong instruction-following capabilities from its base model, gemma-3-1b-it.
Context Length: Supports a substantial context window of 32768 tokens, allowing for processing longer inputs.

Training Details

The model's training procedure utilized GRPO, a technique detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This suggests a focus on developing robust logical and mathematical processing skills. The training was conducted using specific versions of TRL, Transformers, Pytorch, Datasets, and Tokenizers, ensuring a consistent and reproducible environment.

Good For

Applications requiring improved logical and mathematical reasoning.
Tasks benefiting from a model with strong instruction-following and a large context window.
Developers interested in exploring models fine-tuned with advanced reinforcement learning techniques like GRPO.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)