zuhairsan/wordle-grpo-Qwen3-1.7B

TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Jun 26, 2026Architecture:Transformer Cold

The zuhairsan/wordle-grpo-Qwen3-1.7B model is a 1.7 billion parameter language model, fine-tuned from Qwen/Qwen3-1.7B using the TRL framework. This model was specifically trained with the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the DeepSeekMath paper, to enhance its mathematical reasoning capabilities. It is designed for tasks requiring improved logical and mathematical problem-solving, building upon the foundational Qwen3 architecture.

Loading preview...

Model Overview

The zuhairsan/wordle-grpo-Qwen3-1.7B is a 1.7 billion parameter language model, fine-tuned from the base Qwen/Qwen3-1.7B model. This fine-tuning process utilized the TRL framework and incorporated a specialized training method known as GRPO (Gradient-based Reward Policy Optimization).

Key Capabilities

  • Enhanced Mathematical Reasoning: The model's training with GRPO, a method detailed in the DeepSeekMath paper, suggests an optimization for tasks involving mathematical and logical problem-solving.
  • Qwen3 Architecture: Benefits from the robust foundational architecture of the Qwen3 series, providing a strong base for general language understanding and generation.
  • TRL Framework Integration: Developed using the TRL library, indicating potential for further reinforcement learning-based fine-tuning or adaptation.

Good For

  • Applications requiring improved performance on mathematical or reasoning-intensive language tasks.
  • Researchers and developers interested in exploring the effects of GRPO on open-source language models.
  • General text generation where a compact yet capable model is desired, with an emphasis on logical coherence.