Name: odats/rl_nmt_2026_04_13_15_40 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: odats

Model Overview

odats/rl_nmt_2026_04_13_15_40 is a 1 billion parameter instruction-tuned model, fine-tuned from the google/gemma-3-1b-it base model. It leverages the TRL (Transformers Reinforcement Learning) library for its training process.

Key Differentiator: GRPO Training

This model's primary distinction lies in its training methodology. It was trained using GRPO (Generalized Reinforcement Learning with Policy Optimization), a method introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This indicates a specific optimization for:

Enhanced Mathematical Reasoning: The GRPO method is designed to improve a model's ability to handle complex mathematical problems and logical reasoning tasks.

Use Cases

Given its specialized training, this model is particularly well-suited for:

Mathematical Problem Solving: Tasks involving arithmetic, algebra, calculus, and other mathematical domains.
Logical Reasoning: Scenarios requiring structured thought and deduction.
Instruction Following: Benefiting from its instruction-tuned base, it can respond to user prompts effectively, especially those with a reasoning component.

Technical Details

Base Model: google/gemma-3-1b-it
Training Framework: TRL (Transformers Reinforcement Learning)
Training Method: GRPO
Parameter Count: 1 billion
Context Length: 32768 tokens

Overview

Model Overview

Key Differentiator: GRPO Training

Use Cases

Technical Details

Full Model Card (README)