Name: odats/rl_nmt_2026_04_09_07_29 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: odats

Overview

This model, odats/rl_nmt_2026_04_09_07_29, is a 1 billion parameter language model derived from the google/gemma-3-1b-it architecture. It has been fine-tuned using the TRL framework, specifically incorporating the GRPO (Gradient-based Reward Policy Optimization) method.

Key Capabilities

Enhanced Reasoning: The primary differentiator of this model is its training with GRPO, a method introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This suggests a focus on improving the model's ability to handle complex reasoning tasks.
Fine-tuned from Gemma-3-1B-IT: Leveraging the capabilities of the instruction-tuned Gemma-3-1B-IT base model, it is expected to perform well in general conversational and instruction-following scenarios, with an added specialization in reasoning.
Long Context Window: With a context length of 32768 tokens, it can process and generate responses based on substantial amounts of input text.

Training Details

The model's training procedure utilized the TRL library (version 1.0.0) and was specifically influenced by the GRPO method, as detailed in the DeepSeekMath paper. This indicates a reinforcement learning approach aimed at optimizing for specific reasoning objectives.

Good For

Applications requiring advanced reasoning, particularly in areas that might benefit from mathematical or logical problem-solving.
Use cases where a smaller, efficient model with strong reasoning capabilities is preferred over larger, more resource-intensive alternatives.
Developers looking for a Gemma-based model with specialized fine-tuning for complex analytical tasks.

Overview

Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)