Name: odats/rl_nmt_2026_04_03_17_29 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: odats

Model Overview

The odats/rl_nmt_2026_04_03_17_29 is a 1 billion parameter instruction-tuned language model, derived from the google/gemma-3-1b-it architecture. It has been fine-tuned using the TRL library, a framework for transformer reinforcement learning.

Key Training Details

A notable aspect of this model's development is the application of GRPO (Gradient-based Reward Policy Optimization) during its training. This method, introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300), suggests an optimization for tasks involving complex reasoning, particularly in mathematical contexts. The training process utilized specific versions of key frameworks:

TRL: 1.0.0
Transformers: 4.57.6
Pytorch: 2.10.0
Datasets: 4.8.4
Tokenizers: 0.22.2

Potential Use Cases

Given its fine-tuning with the GRPO method, this model is likely to perform well in:

Reasoning-intensive tasks: Especially those that benefit from enhanced logical or mathematical processing.
Instruction following: Leveraging its instruction-tuned base model for various prompts.
Applications requiring a compact yet capable model: Its 1 billion parameter size makes it efficient for deployment while still offering advanced reasoning capabilities.

Overview

Model Overview

Key Training Details

Potential Use Cases

Full Model Card (README)