Name: odats/rl_nmt_2026_04_06_16_56 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: odats

Model Overview

odats/rl_nmt_2026_04_06_16_56 is a 1 billion parameter language model, fine-tuned from the google/gemma-3-1b-it base model. It leverages the TRL (Transformers Reinforcement Learning) library for its training process.

Key Capabilities

Enhanced Reasoning: The model was specifically trained using the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This training approach aims to improve the model's ability to handle complex reasoning tasks.
Instruction Following: Building on its gemma-3-1b-it foundation, the model is designed to follow instructions effectively, making it suitable for interactive applications.

Training Details

The model's training utilized TRL version 1.0.0, Transformers 4.57.6, Pytorch 2.10.0, Datasets 4.8.4, and Tokenizers 0.22.2. The application of the GRPO method suggests a focus on improving performance in areas that benefit from advanced reasoning, potentially including mathematical problem-solving or logical deduction.

Good For

Applications requiring a compact model with improved reasoning capabilities.
Tasks where instruction-following and logical processing are crucial.
Exploration of models fine-tuned with advanced reinforcement learning techniques like GRPO.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)