Name: odats/rl_nmt_2026_04_03_16_45 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: odats

Overview

This model, odats/rl_nmt_2026_04_03_16_45, is a 1 billion parameter instruction-tuned language model. It is a fine-tuned version of the google/gemma-3-1b-it base model, developed by odats. The fine-tuning process utilized the TRL (Transformers Reinforcement Learning) library.

Key Capabilities

Mathematical Reasoning: The model was specifically trained using the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the "DeepSeekMath" paper. This training approach aims to push the limits of mathematical reasoning in open language models.
Instruction Following: As an instruction-tuned model, it is designed to follow user prompts effectively, generating relevant and coherent responses.
Context Handling: It supports a context length of 32768 tokens, allowing for processing and generating longer sequences of text.

Training Details

The model's training leveraged TRL for reinforcement learning. The GRPO method, detailed in the DeepSeekMath paper, was central to its optimization for mathematical tasks. Training progress and metrics were tracked using Weights & Biases.

Good For

Applications requiring strong mathematical reasoning.
Tasks benefiting from instruction-tuned models with a focus on logical problem-solving.
Scenarios where a smaller, specialized model with a large context window is advantageous.

Overview

Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)