Name: odats/rl_nmt_2026_04_06_16_48 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: odats

Model Overview

odats/rl_nmt_2026_04_06_16_48 is a 1 billion parameter language model, fine-tuned from the google/gemma-3-1b-it base model. It leverages the TRL library for its training procedure.

Key Differentiator: GRPO Training

A significant aspect of this model's training is the application of GRPO (Gradient-based Reward Policy Optimization). This method, introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300), is specifically designed to improve a model's capabilities in mathematical reasoning tasks. This suggests the model is optimized for complex problem-solving and logical deduction.

Technical Specifications

Base Model: google/gemma-3-1b-it
Parameter Count: 1 billion
Context Length: 32768 tokens
Training Frameworks: TRL (version 1.0.0), Transformers (version 4.57.6), Pytorch (version 2.10.0), Datasets (version 4.8.4), Tokenizers (version 0.22.2).

Use Cases

Given its fine-tuning with the GRPO method, this model is particularly well-suited for:

Mathematical Reasoning: Solving complex mathematical problems and generating logical steps.
Problem Solving: Tasks requiring structured thought and deductive reasoning.
Instruction Following: Responding to prompts that demand precise and reasoned answers.

Overview

Model Overview

Key Differentiator: GRPO Training

Technical Specifications

Use Cases

Full Model Card (README)