Name: odats/rl_nmt_2026_04_08_10_56 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: odats

Model Overview

odats/rl_nmt_2026_04_08_10_56 is a 1 billion parameter instruction-tuned language model, fine-tuned from the google/gemma-3-1b-it base model. It was developed by odats using the TRL (Transformers Reinforcement Learning) framework.

Key Capabilities

Enhanced Reasoning: This model was trained using the GRPO (Gradient-based Reinforcement Learning with Policy Optimization) method, as introduced in the DeepSeekMath paper. This training approach is specifically designed to push the limits of mathematical reasoning in open language models.
Instruction Following: As an instruction-tuned model, it is capable of understanding and executing user prompts effectively, building on the capabilities of its Gemma base.
Context Length: Supports a context length of 32768 tokens, allowing for processing longer inputs and maintaining conversational coherence over extended interactions.

Good For

Mathematical Reasoning Tasks: Its GRPO training makes it particularly well-suited for applications requiring robust mathematical problem-solving and logical deduction.
General Instruction-Following: Can be used for a wide range of conversational and generative AI tasks where clear instruction adherence is important.
Research in RLHF Methods: Provides a practical example of GRPO application, useful for researchers exploring advanced reinforcement learning techniques for language models.

Overview

Model Overview

Key Capabilities

Good For

Full Model Card (README)