odats/rl_nmt_2026_04_09_15_36
TEXT GENERATIONConcurrency Cost:1Model Size:1BQuant:BF16Ctx Length:32kPublished:Apr 9, 2026Architecture:Transformer Warm
The odats/rl_nmt_2026_04_09_15_36 model is a 1 billion parameter language model fine-tuned from google/gemma-3-1b-it using the TRL framework. This model was specifically trained with GRPO (Gradient-based Reinforcement Learning with Policy Optimization), a method designed to enhance mathematical reasoning capabilities, as introduced in the DeepSeekMath paper. It is optimized for tasks requiring advanced reasoning, building upon the foundational capabilities of the Gemma architecture.
Loading preview...