Name: odats/rl_nmt_2026_04_09_15_37 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: odats

Model Overview

The odats/rl_nmt_2026_04_09_15_37 is a 1 billion parameter language model, fine-tuned from the google/gemma-3-1b-it base model. It leverages the TRL library for its training procedure.

Key Training Details

This model was trained using GRPO (Generalized Reinforcement Learning with Policy Optimization), a method introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This specialized training approach aims to enhance the model's reasoning capabilities.

Framework Versions

TRL: 1.0.0
Transformers: 4.57.6
Pytorch: 2.10.0
Datasets: 4.8.4
Tokenizers: 0.22.2

Potential Use Cases

Given its fine-tuning with a method focused on mathematical reasoning, this model is likely well-suited for applications that require:

Enhanced reasoning tasks
Problem-solving scenarios
Instruction-following where logical deduction is beneficial

Overview

Model Overview

Key Training Details

Framework Versions

Potential Use Cases

Full Model Card (README)