Name: odats/rl_nmt_2026_04_07_10_29 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: odats

Model Overview

The odats/rl_nmt_2026_04_07_10_29 is a 1 billion parameter instruction-tuned language model, fine-tuned from the google/gemma-3-1b-it base model. It leverages the TRL library for its training process.

Key Differentiator: GRPO Training

A significant aspect of this model's development is its training with GRPO (Gradient-based Reward Policy Optimization). This method, introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models", aims to enhance the model's mathematical reasoning capabilities. By applying GRPO, the model is expected to exhibit improved performance on tasks that require logical deduction and mathematical problem-solving.

Training Frameworks

The model was trained using specific versions of popular frameworks:

TRL: 1.0.0
Transformers: 4.57.6
Pytorch: 2.10.0
Datasets: 4.8.4
Tokenizers: 0.22.2

Use Cases

This model is particularly well-suited for applications where:

Improved reasoning and logical deduction are critical.
Tasks involve mathematical problem-solving or understanding complex numerical relationships.
A compact yet capable instruction-tuned model is required for deployment.

Overview

Model Overview

Key Differentiator: GRPO Training

Training Frameworks

Use Cases

Full Model Card (README)