Name: odats/rl_nmt_2026_04_08_10_28 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: odats

Model Overview

The odats/rl_nmt_2026_04_08_10_28 is a 1 billion parameter language model, fine-tuned from the google/gemma-3-1b-it base model. It leverages the TRL (Transformers Reinforcement Learning) library for its training process.

Key Training Details

A significant aspect of this model's development is its training methodology. It was trained using GRPO (Generalized Reinforcement Learning with Policy Optimization), a method introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This indicates a focus on enhancing the model's reasoning abilities, particularly in mathematical contexts.

Framework Versions

The model was developed using specific versions of key frameworks:

TRL: 1.0.0
Transformers: 4.57.6
Pytorch: 2.10.0+cu130
Datasets: 4.8.4
Tokenizers: 0.22.2

Potential Use Cases

Given its fine-tuning with GRPO, this model is likely suitable for applications that benefit from improved reasoning and mathematical understanding, especially where a smaller parameter count is advantageous for deployment efficiency.

Overview

Model Overview

Key Training Details

Framework Versions

Potential Use Cases

Full Model Card (README)