Name: odats/rl_nmt_2026_04_09_13_37 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: odats

Model Overview

The odats/rl_nmt_2026_04_09_13_37 is a 1 billion parameter instruction-tuned language model, derived from the google/gemma-3-1b-it base model. It has been fine-tuned using the TRL library, a framework for transformer reinforcement learning.

Key Training Details

A significant aspect of this model's development is the application of GRPO (Gradient Regularized Policy Optimization) during its training. GRPO is a method introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This suggests an optimization focus on enhancing the model's reasoning abilities, particularly in mathematical contexts.

Intended Use Cases

Given its fine-tuning methodology, this model is likely well-suited for:

Reasoning tasks: Especially those that benefit from improved logical and mathematical processing.
Instruction-following: Building on its gemma-3-1b-it base, it should perform well in responding to user prompts.
Applications requiring enhanced problem-solving: Where the GRPO method's benefits in mathematical reasoning can be leveraged.

Overview

Model Overview

Key Training Details

Intended Use Cases

Full Model Card (README)