Name: odats/rl_nmt_2026_04_11_13_31 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: odats

Overview

This model, odats/rl_nmt_2026_04_11_13_31, is a 1 billion parameter language model derived from the google/gemma-3-1b-it architecture. It has been fine-tuned using the TRL framework to enhance its performance.

Key Capabilities

Mathematical Reasoning: A primary differentiator of this model is its training with GRPO (Gradient-based Reward Policy Optimization), a method introduced in the DeepSeekMath paper. This suggests an optimization for tasks requiring mathematical reasoning.
Instruction Following: As a fine-tuned version of an instruction-tuned model, it is designed to follow user instructions effectively.

Training Details

The model's training procedure utilized GRPO, a technique aimed at pushing the limits of mathematical reasoning in open language models. The training was conducted using specific versions of key frameworks:

TRL: 1.0.0
Transformers: 4.57.6
Pytorch: 2.10.0
Datasets: 4.8.4
Tokenizers: 0.22.2

Good For

Applications requiring enhanced mathematical reasoning.
General instruction-following tasks where a smaller, efficient model is preferred.

Overview

Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)