GoldenGrapeGentleman1/pokemon-showdown-agent-v6
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Apr 2, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

GoldenGrapeGentleman1/pokemon-showdown-agent-v6 is a 4 billion parameter Qwen/Qwen3-4B fine-tune, developed by GoldenGrapeGentleman1, specifically designed for next-action prediction from raw Pokemon Showdown replay logs. This model excels at generating precise action commands like 'move Earthquake' or 'switch Corviknight' directly from battle log prefixes. It was fine-tuned using Unsloth and TRL, with a focus on AMD ROCm workflows, and supports a context length of up to 32768 tokens for inference.

Loading preview...

Pokemon Showdown Agent v6 Overview

GoldenGrapeGentleman1/pokemon-showdown-agent-v6 is a specialized 4 billion parameter model, fine-tuned from Qwen/Qwen3-4B, designed to predict the next action in a Pokemon Showdown battle directly from raw replay logs. Unlike models requiring hand-written state summaries, this agent learns from messy, real-world battle data.

Key Capabilities

  • Direct Action Prediction: Generates precise, short action commands (e.g., move [move-name], switch [pokemon-name]) suitable for agent pipelines.
  • Raw Log Processing: Learns directly from raw Pokemon Showdown replay logs, simplifying input requirements.
  • AMD ROCm Optimized: Developed with AMD ROCm workflows in mind, recommending bfloat16 for stable inference.
  • Chat-style Prompting: Utilizes a system message to define the agent's side and a user message for the battle log prefix.

Training Details

The model was fine-tuned using LoRA SFT with Unsloth and TRL, leveraging the milkkarten/pokemon-showdown-replays-merged dataset. The training involved 100,000 train games and 10,000 test games, resulting in over 2.3 million training samples. It supports a full training context length of up to 4096 tokens.

Intended Use Cases

  • Predicting next actions from raw Pokemon Showdown log prefixes.
  • Building text-only battle agents or evaluation harnesses.
  • Studying agent alignment from real replay trajectories.

Limitations

This model is a research checkpoint and not a complete battle engine. It may still produce illegal or strategically weak actions, and its reliability is sensitive to prompt wording. It does not include legality checks or full battle-state management.