mrinaalarora/wordle-grpo-Qwen3-1.7B
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Mar 29, 2026Architecture:Transformer Cold

The mrinaalarora/wordle-grpo-Qwen3-1.7B model is a 2 billion parameter language model, fine-tuned from Qwen/Qwen3-1.7B by mrinaalarora. It was trained using the GRPO (Gradient-based Reinforcement Learning with Policy Optimization) method, as introduced in the DeepSeekMath paper. This model is specifically optimized for tasks that benefit from advanced mathematical reasoning and structured problem-solving, leveraging its specialized training approach. It offers a 32768 token context length, making it suitable for processing extensive inputs related to its fine-tuned capabilities.

Loading preview...