Name: odats/rl_nmt_2026_04_11_13_41 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: odats

Overview

odats/rl_nmt_2026_04_11_13_41 is a 1 billion parameter language model, fine-tuned from the google/gemma-3-1b-it base model. It leverages the Reinforcement Learning (RL) framework TRL for its training process. A key differentiator for this model is its application of the GRPO (Gradient-based Reward Policy Optimization) method, a technique highlighted in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This specialized training approach aims to enhance the model's reasoning abilities.

Key Capabilities

Enhanced Reasoning: Benefits from GRPO training, which is designed to improve mathematical and general reasoning skills.
Instruction Following: Built upon an instruction-tuned base model, making it suitable for various prompt-based tasks.
Efficient Size: At 1 billion parameters, it offers a balance between performance and computational efficiency.

Good For

Applications requiring improved logical and mathematical reasoning.
Scenarios where a smaller, yet capable, instruction-following model is preferred.
Experimentation with models trained using advanced RL techniques like GRPO.

Overview

Overview

Key Capabilities

Good For

Full Model Card (README)