Name: odats/rl_nmt_2026_04_06_16_57 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: odats

Model Overview

The odats/rl_nmt_2026_04_06_16_57 is a 1 billion parameter language model, fine-tuned from the google/gemma-3-1b-it base model. It was developed using the TRL framework, specifically incorporating the GRPO (Guided Reinforcement Learning for Policy Optimization) training method.

Key Capabilities

Enhanced Mathematical Reasoning: The model's training with GRPO, a method introduced in the DeepSeekMath paper, focuses on improving its ability to handle complex mathematical and logical problems.
Instruction Following: As a fine-tuned instruction model, it is designed to respond effectively to user prompts and questions.
Context Handling: Supports a substantial context length of 32768 tokens, allowing for processing and generating longer, more complex inputs and outputs.

Training Details

The model's training procedure utilized GRPO, a technique aimed at pushing the boundaries of mathematical reasoning in open language models. This approach differentiates it from standard instruction-tuned models by specifically targeting and optimizing for reasoning tasks.

Good For

Applications requiring strong mathematical and logical problem-solving.
Tasks where precise reasoning and numerical understanding are paramount.
Scenarios benefiting from a model with a large context window for detailed interactions.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)