odats/rl_nmt_2026_04_06_16_19
TEXT GENERATIONConcurrency Cost:1Model Size:1BQuant:BF16Ctx Length:32kPublished:Apr 6, 2026Architecture:Transformer Cold

The odats/rl_nmt_2026_04_06_16_19 model is a 1 billion parameter instruction-tuned language model, fine-tuned from google/gemma-3-1b-it. It was trained using the TRL library and incorporates the GRPO method, which is designed to enhance mathematical reasoning. This model is optimized for tasks requiring improved reasoning capabilities, particularly in areas where mathematical understanding is beneficial, and supports a context length of 32768 tokens.

Loading preview...