Name: odats/rl_nmt_2026_04_10_07_53 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: odats

Model Overview

The odats/rl_nmt_2026_04_10_07_53 is a 1 billion parameter language model, fine-tuned from the google/gemma-3-1b-it base model. It leverages the TRL (Transformers Reinforcement Learning) framework for its training process.

Key Capabilities

Enhanced Reasoning: This model was specifically trained using the GRPO method, as introduced in the DeepSeekMath paper. This method is designed to push the limits of mathematical reasoning in open language models.
Extended Context Window: It supports a substantial context length of 32768 tokens, allowing for processing and generating longer sequences of text.
Instruction Following: As a fine-tuned version of an instruction-tuned model, it is capable of following user instructions effectively.

Training Details

The model's training procedure utilized GRPO, a technique focused on improving mathematical reasoning. The training run can be visualized via Weights & Biases. Key framework versions used include TRL 1.0.0, Transformers 4.57.6, Pytorch 2.10.0, Datasets 4.8.4, and Tokenizers 0.22.2.

Good For

Applications requiring strong mathematical reasoning.
Tasks benefiting from a large context window.
Instruction-following scenarios where the base Gemma-3-1b-it capabilities are desired with enhanced reasoning.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)