Name: odats/nmt_21 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: odats

Model Overview

odats/nmt_21 is a fine-tuned iteration of the google/gemma-3-1b-it model, developed by odats. This model leverages the TRL (Transformers Reinforcement Learning) framework for its training process.

Key Training Details

A significant aspect of this model's development is the integration of GRPO (Gradient-based Reward Policy Optimization). This method, introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models," suggests an optimization for tasks requiring advanced reasoning, particularly in mathematical contexts. The training run can be visualized via Weights & Biases.

Framework Versions Used

TRL: 1.0.0
Transformers: 4.57.6
Pytorch: 2.10.0
Datasets: 4.8.4
Tokenizers: 0.22.2

Potential Use Cases

Given its fine-tuning with the GRPO method, this model is likely optimized for:

Mathematical reasoning tasks
Problem-solving scenarios that benefit from enhanced logical deduction
Applications requiring improved accuracy in numerical or scientific contexts

Overview

Model Overview

Key Training Details

Framework Versions Used

Potential Use Cases

Full Model Card (README)