Name: odats/rl_nmt_2026_04_13_15_39 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: odats

Overview

odats/rl_nmt_2026_04_13_15_39 is a 1 billion parameter language model, fine-tuned from the google/gemma-3-1b-it base model. It leverages the TRL (Transformers Reinforcement Learning) library for its training process. A key differentiator for this model is its training methodology, which incorporates GRPO (Generalized Reinforcement Learning with Policy Optimization), a technique introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300).

Key Capabilities

Enhanced Reasoning: Optimized for tasks that require advanced reasoning, particularly in mathematical domains, due to its GRPO-based training.
Instruction Following: As a fine-tuned instruction model, it is designed to follow user prompts effectively.
Context Handling: Supports a substantial context length of 32768 tokens, allowing for processing longer inputs and maintaining conversational coherence.

Good For

Mathematical Reasoning Tasks: Ideal for applications requiring robust mathematical problem-solving and logical deduction.
Instruction-based Generation: Suitable for general instruction-following tasks where a smaller, efficient model is preferred.
Research and Development: Provides a foundation for further experimentation with GRPO-based fine-tuning on Gemma architectures.

Overview

Overview

Key Capabilities

Good For

Full Model Card (README)