iamthewalrus67/kulyk-uk-en-grpo
The iamthewalrus67/kulyk-uk-en-grpo model is a 0.35 billion parameter Ukrainian-to-English translation model, based on the LFM2-350M architecture. It is an improved version of Yehor/kulyk-uk-en, fine-tuned using GRPO (Guardrail-Reinforced Policy Optimization) with calibrated guardrail rewards on WikiMatrix data. This model specializes in enhancing translation quality, demonstrating improved BLEU, chrF, and CometKiwi scores over its baseline on both FLoRes+ devtest and WMT24 uk-en benchmarks. Its primary application is high-quality, reward-driven Ukrainian-to-English machine translation.
Loading preview...
Overview
This model, iamthewalrus67/kulyk-uk-en-grpo, is a 0.35 billion parameter Ukrainian-to-English translation model. It builds upon the Yehor/kulyk-uk-en (LFM2-350M) baseline, significantly enhancing translation quality through a technique called GRPO (Guardrail-Reinforced Policy Optimization). The model was fine-tuned using calibrated guardrail rewards on a WikiMatrix uk-en dataset, comprising 132,000 pairs.
Key Capabilities
- Improved Translation Quality: Demonstrates superior performance compared to its baseline across multiple metrics.
- Reward-Driven Optimization: Utilizes a sophisticated reward system combining chrF, BLEU, CometKiwi, and five calibrated guardrails during training.
- Benchmark Performance: Achieves higher BLEU, chrF, and CometKiwi scores on both FLoRes+ devtest (sentence-by-sentence) and WMT24 uk-en (news domain, out-of-distribution) benchmarks.
Training Details
The model underwent a full fine-tuning process (without LoRA) over 300 steps on a single GPU, with the best checkpoint identified at step 100. The training methodology focused on optimizing translation output through a carefully constructed reward function.
Good For
- Ukrainian-to-English Machine Translation: Specifically designed and optimized for this language pair.
- Applications Requiring High Translation Fidelity: Suitable for use cases where accuracy and quality of translation are critical.
- Research in Reward-Driven NLP: Provides a practical example of GRPO application in neural machine translation.