iamthewalrus67/kulyk-uk-en-grpo

TEXT GENERATIONConcurrency Cost:1Model Size:0.35BQuant:BF16Ctx Length:32kPublished:Mar 15, 2026License:lfm1.0Architecture:Transformer Cold

The iamthewalrus67/kulyk-uk-en-grpo model is a 0.35 billion parameter Ukrainian-to-English translation model, based on the LFM2-350M architecture. It is an improved version of Yehor/kulyk-uk-en, fine-tuned using GRPO (Guardrail-Reinforced Policy Optimization) with calibrated guardrail rewards on WikiMatrix data. This model specializes in enhancing translation quality, demonstrating improved BLEU, chrF, and CometKiwi scores over its baseline on both FLoRes+ devtest and WMT24 uk-en benchmarks. Its primary application is high-quality, reward-driven Ukrainian-to-English machine translation.

Loading preview...

Overview

This model, iamthewalrus67/kulyk-uk-en-grpo, is a 0.35 billion parameter Ukrainian-to-English translation model. It builds upon the Yehor/kulyk-uk-en (LFM2-350M) baseline, significantly enhancing translation quality through a technique called GRPO (Guardrail-Reinforced Policy Optimization). The model was fine-tuned using calibrated guardrail rewards on a WikiMatrix uk-en dataset, comprising 132,000 pairs.

Key Capabilities

  • Improved Translation Quality: Demonstrates superior performance compared to its baseline across multiple metrics.
  • Reward-Driven Optimization: Utilizes a sophisticated reward system combining chrF, BLEU, CometKiwi, and five calibrated guardrails during training.
  • Benchmark Performance: Achieves higher BLEU, chrF, and CometKiwi scores on both FLoRes+ devtest (sentence-by-sentence) and WMT24 uk-en (news domain, out-of-distribution) benchmarks.

Training Details

The model underwent a full fine-tuning process (without LoRA) over 300 steps on a single GPU, with the best checkpoint identified at step 100. The training methodology focused on optimizing translation output through a carefully constructed reward function.

Good For

  • Ukrainian-to-English Machine Translation: Specifically designed and optimized for this language pair.
  • Applications Requiring High Translation Fidelity: Suitable for use cases where accuracy and quality of translation are critical.
  • Research in Reward-Driven NLP: Provides a practical example of GRPO application in neural machine translation.