Name: odats/rl_nmt_2026_04_12_13_17 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: odats

Overview

This model, odats/rl_nmt_2026_04_12_13_17, is a 1 billion parameter language model derived from the google/gemma-3-1b-it architecture. It has been specifically fine-tuned using the TRL (Transformers Reinforcement Learning) framework.

Key Training Details

The model's training incorporated the GRPO (Gradient-based Reward Policy Optimization) method. GRPO is a technique introduced in the context of improving mathematical reasoning in large language models, as detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This suggests a focus on enhancing the model's logical and problem-solving abilities.

Potential Use Cases

Reasoning-intensive tasks: Given its training with GRPO, the model may perform well in scenarios requiring logical deduction or mathematical problem-solving.
Building upon Gemma-3-1b-it: Developers familiar with the base Gemma model can leverage this fine-tuned version for improved performance in specific reasoning domains.

Framework Versions

The training utilized TRL 1.1.0, Transformers 4.57.6, Pytorch 2.10.0, Datasets 4.8.4, and Tokenizers 0.22.2.

Overview

Overview

Key Training Details

Potential Use Cases

Framework Versions

Full Model Card (README)