Name: odats/rl_nmt_2026_04_12_13_14 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: odats

Overview

odats/rl_nmt_2026_04_12_13_14 is a 1 billion parameter language model, fine-tuned from the google/gemma-3-1b-it base model. It utilizes a substantial context length of 32768 tokens, allowing it to process extensive inputs for complex tasks. The model's training incorporated the GRPO (Gradient-based Reinforcement Learning with Policy Optimization) method, as introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This specialized training approach, implemented using the TRL framework, aims to enhance the model's capabilities in advanced reasoning.

Key Capabilities

Enhanced Reasoning: Specifically trained with GRPO, a method designed to improve mathematical and logical reasoning skills.
Large Context Window: Supports a 32768-token context length, beneficial for understanding and generating responses based on extensive information.
Fine-tuned from Gemma: Builds upon the robust architecture and pre-training of the Gemma-3-1b-it model.

Good For

Mathematical Problem Solving: Ideal for applications requiring precise mathematical reasoning and logical deduction.
Complex Query Handling: Suitable for tasks that benefit from processing and synthesizing information from long input sequences.
Research and Development: A strong candidate for further experimentation and fine-tuning on specialized reasoning datasets.

Overview

Overview

Key Capabilities

Good For

Full Model Card (README)