Name: odats/rl_nmt_2026_04_07_11_01 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: odats

Model Overview

odats/rl_nmt_2026_04_07_11_01 is a 1 billion parameter language model, derived from the google/gemma-3-1b-it base model. It has been fine-tuned using the TRL (Transformers Reinforcement Learning) framework, specifically employing the GRPO (Gradient-based Reinforcement Learning with Policy Optimization) method.

Key Capabilities

Enhanced Mathematical Reasoning: The core differentiator of this model is its training with GRPO, a method introduced in the context of improving mathematical reasoning in large language models. This suggests a specialization in handling complex mathematical problems and logical deductions.
Instruction Following: As a fine-tuned version of an instruction-tuned model (gemma-3-1b-it), it retains strong capabilities in understanding and following user instructions.
Efficient Performance: With 1 billion parameters, it offers a balance between performance and computational efficiency, making it suitable for applications where resource constraints are a consideration.

Training Details

The model's training procedure leveraged the TRL framework, with specific emphasis on GRPO, as detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This indicates a focused approach to improving its ability to process and generate mathematically sound responses.

Use Cases

This model is particularly well-suited for applications requiring:

Mathematical problem-solving.
Logical reasoning tasks.
Educational tools for mathematics.
Any scenario where robust numerical and logical understanding is critical.

Overview

Model Overview

Key Capabilities

Training Details

Use Cases

Full Model Card (README)