Name: odats/rl_nmt_2026_04_09_07_36 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: odats

Model Overview

The odats/rl_nmt_2026_04_09_07_36 is a 1 billion parameter instruction-tuned language model, building upon the google/gemma-3-1b-it architecture. It features a substantial context length of 32768 tokens, allowing for processing longer inputs and generating more extensive responses.

Key Capabilities

Enhanced Reasoning: This model was specifically trained using the GRPO (Gradient-based Reward Policy Optimization) method, a technique highlighted in the DeepSeekMath paper. This training approach aims to improve the model's reasoning abilities, particularly in mathematical contexts.
Instruction Following: As an instruction-tuned model, it is designed to understand and execute user commands effectively, making it suitable for conversational agents and task-oriented applications.
TRL Framework: The model's fine-tuning process leveraged the TRL library, a framework for training language models with reinforcement learning.

Good For

Reasoning-intensive tasks: Its GRPO training suggests a strong aptitude for problems requiring logical deduction and mathematical understanding.
Applications requiring long context: The 32768-token context window makes it suitable for summarizing long documents, extended conversations, or complex code analysis.
Exploration of GRPO-trained models: Developers interested in models fine-tuned with advanced reinforcement learning techniques for reasoning tasks.

Overview

Model Overview

Key Capabilities

Good For

Full Model Card (README)