odats/rl_nmt_2026_04_09_15_37

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1BQuant:BF16Ctx Length:32kPublished:Apr 9, 2026Architecture:Transformer Warm

The odats/rl_nmt_2026_04_09_15_37 is a 1 billion parameter instruction-tuned language model, fine-tuned from google/gemma-3-1b-it. It was trained using the TRL library and the GRPO method, which is designed to enhance mathematical reasoning. This model is particularly suited for tasks requiring improved reasoning capabilities, leveraging its specialized training approach.

Loading preview...

Model Overview

The odats/rl_nmt_2026_04_09_15_37 is a 1 billion parameter language model, fine-tuned from the google/gemma-3-1b-it base model. It leverages the TRL library for its training procedure.

Key Training Details

This model was trained using GRPO (Generalized Reinforcement Learning with Policy Optimization), a method introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This specialized training approach aims to enhance the model's reasoning capabilities.

Framework Versions

  • TRL: 1.0.0
  • Transformers: 4.57.6
  • Pytorch: 2.10.0
  • Datasets: 4.8.4
  • Tokenizers: 0.22.2

Potential Use Cases

Given its fine-tuning with a method focused on mathematical reasoning, this model is likely well-suited for applications that require:

  • Enhanced reasoning tasks
  • Problem-solving scenarios
  • Instruction-following where logical deduction is beneficial