Name: odats/rl_nmt_2026_04_10_07_47 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: odats

Model Overview

The odats/rl_nmt_2026_04_10_07_47 is a 1 billion parameter instruction-tuned language model, building upon the google/gemma-3-1b-it base. It was developed by odats and fine-tuned using the TRL (Transformers Reinforcement Learning) framework.

Key Training Methodology

A distinguishing feature of this model is its training procedure, which utilizes GRPO (Generalized Reinforcement Learning with Policy Optimization). This method was introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). The application of GRPO suggests an optimization for tasks that involve complex reasoning, particularly in mathematical contexts.

Technical Specifications

Base Model: google/gemma-3-1b-it
Parameters: 1 billion
Context Length: 32768 tokens
Frameworks Used: TRL (v1.0.0), Transformers (v4.57.6), Pytorch (v2.10.0), Datasets (v4.8.4), Tokenizers (v0.22.2)

Potential Use Cases

Given its fine-tuning with GRPO, this model is likely well-suited for:

Mathematical problem-solving and reasoning tasks.
Applications requiring logical deduction and analytical capabilities.
Instruction-following scenarios where precise and reasoned responses are critical.

Overview

Model Overview

Key Training Methodology

Technical Specifications

Potential Use Cases

Full Model Card (README)