odats/rl_nmt_2026_04_09_15_37
TEXT GENERATIONConcurrency Cost:1Model Size:1BQuant:BF16Ctx Length:32kPublished:Apr 9, 2026Architecture:Transformer Warm

The odats/rl_nmt_2026_04_09_15_37 is a 1 billion parameter instruction-tuned language model, fine-tuned from google/gemma-3-1b-it. It was trained using the TRL library and the GRPO method, which is designed to enhance mathematical reasoning. This model is particularly suited for tasks requiring improved reasoning capabilities, leveraging its specialized training approach.

Loading preview...