odats/rl_nmt_2026_04_10_07_47
TEXT GENERATIONConcurrency Cost:1Model Size:1BQuant:BF16Ctx Length:32kPublished:Apr 10, 2026Architecture:Transformer Warm

The odats/rl_nmt_2026_04_10_07_47 model is a 1 billion parameter instruction-tuned language model, fine-tuned from google/gemma-3-1b-it. Developed by odats, this model was trained using the TRL framework and incorporates the GRPO method, which is designed to enhance mathematical reasoning capabilities. It is optimized for tasks requiring advanced reasoning, leveraging techniques from DeepSeekMath. With a context length of 32768 tokens, it is suitable for applications needing robust analytical processing.

Loading preview...