Name: odats/rl_nmt_2026_04_03_17_27 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: odats

Overview

odats/rl_nmt_2026_04_03_17_27 is a 1 billion parameter instruction-tuned model, fine-tuned from Google's Gemma-3-1B-IT. It was developed by odats and trained using the Reinforcement Learning (RL) framework TRL.

Key Capabilities

Enhanced Mathematical Reasoning: This model was specifically trained with GRPO (Gradient-based Reward Policy Optimization), a method introduced in the "DeepSeekMath" paper, which focuses on pushing the limits of mathematical reasoning in open language models.
Instruction Following: As an instruction-tuned model, it is designed to understand and execute user prompts effectively.
Extended Context Window: Supports a context length of 32768 tokens, allowing for processing longer inputs and maintaining conversational coherence over extended interactions.

Training Details

The model leveraged the TRL framework for its fine-tuning process. The GRPO training method, detailed in the DeepSeekMath paper, was central to its development, aiming to improve its performance on complex reasoning tasks.

Good For

Applications requiring strong mathematical reasoning.
Tasks benefiting from advanced instruction following.
Use cases where a smaller, yet capable, model with a large context window is preferred for reasoning-intensive tasks.

Overview

Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)