Name: odats/rl_nmt_2026_04_03_17_00 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: odats

Model Overview

The odats/rl_nmt_2026_04_03_17_00 is a 1 billion parameter language model, building upon the google/gemma-3-1b-it architecture. It has been fine-tuned using the TRL (Transformers Reinforcement Learning) framework, specifically incorporating the GRPO (Generalized Reinforcement Learning with Policy Optimization) method.

Key Capabilities

Enhanced Reasoning: The model's training with GRPO, a technique highlighted in the DeepSeekMath paper, suggests an optimization for improved reasoning abilities.
Mathematical Contexts: Given its training methodology, it is particularly suited for tasks that involve mathematical reasoning.
Extended Context Window: Supports a substantial context length of 32768 tokens, allowing for processing longer inputs and maintaining coherence over extended conversations or documents.

Training Details

The model leverages the GRPO method, which is detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This indicates a focus on robust and accurate processing of complex logical and mathematical problems.

Good For

Applications requiring a compact model with strong reasoning capabilities.
Tasks involving mathematical problem-solving or logical deduction.
Scenarios where a larger context window is beneficial for understanding complex queries.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)