shaikabdulfahad/wordle-qwen2-mini
The shaikabdulfahad/wordle-qwen2-mini is a 0.5 billion parameter Qwen2-Instruct model fine-tuned by Shaik Abdul Fahad using Group Relative Policy Optimization (GRPO) reinforcement learning. This model is specifically designed to play the Wordle game, learning optimal strategies purely from reward signals rather than supervised examples. It excels at strategic word guessing, including opening with vowel-rich words and utilizing feedback effectively, with a context length of 32768 tokens.
Loading preview...
Overview
This model, developed by Shaik Abdul Fahad, is a fine-tuned Qwen2-0.5B-Instruct variant specifically engineered to play the popular word game Wordle. Unlike traditional supervised learning, it leverages Group Relative Policy Optimization (GRPO), a reinforcement learning algorithm, to learn optimal strategies directly from reward signals over 20 training games.
Key Capabilities
- Strategic Wordle Play: Learns and applies effective Wordle strategies, such as starting with vowel-rich words (e.g., CRANE, SLATE), using green letter positions, repositioning yellow letters, and avoiding repeated guesses.
- Reinforcement Learning: Trained purely on reward signals, demonstrating an ability to learn complex game strategies without human-provided examples.
- Compact Size: Built on a 0.5 billion parameter base model, making it efficient for its specialized task.
Training Details
The model was trained using a reward system that incentivizes winning the game (+1.0), identifying green letters (+0.3), yellow letters (+0.1), making new guesses (+0.3), and using valid 5-letter words (+0.2). The training pipeline involved connecting to a live Wordle environment (TextArena) via OpenEnv, generating guesses, receiving feedback, calculating rewards, and updating the model using GRPO.
Limitations
- Limited training (20 games) and model size (0.5B parameters) restrict its current performance.
- Occasionally repeats guesses despite built-in penalties.
Good for
- Research and experimentation in applying reinforcement learning to language models for game-playing.
- Understanding how LLMs can learn complex strategic tasks from reward signals.
- Demonstrating specialized AI agents for specific game environments.