nics-efc/MARSHAL-Kuhn-Poker-Qwen3-4B
nics-efc/MARSHAL-Kuhn-Poker-Qwen3-4B is a 4 billion parameter model, initialized from Qwen3-4B, specifically trained as a Kuhn Poker specialist within the MARSHAL framework. Developed by Huining Yuan et al. from the MARSHAL project, this model leverages self-play with a turn-level advantage estimator and agent-specific advantage normalization for fine-grained credit assignment in multi-agent, multi-turn strategic games. It excels in competitive imperfect-information games like Kuhn Poker and demonstrates generalization capabilities, improving performance on reasoning benchmarks when integrated into multi-agent systems.
Loading preview...
MARSHAL: Kuhn Poker Specialist
This model, nics-efc/MARSHAL-Kuhn-Poker-Qwen3-4B, is a 4 billion parameter variant of Qwen3-4B, specifically fine-tuned as a Kuhn Poker specialist within the innovative MARSHAL framework. MARSHAL is an end-to-end reinforcement learning framework designed to enhance multi-agent reasoning through self-play in various competitive and cooperative games. It addresses complex credit assignment challenges in multi-agent, multi-turn scenarios.
Key Capabilities
- Specialized Game Play: Expert performance in Kuhn Poker, a competitive imperfect-information game.
- Advanced Credit Assignment: Utilizes a Turn-level Advantage Estimator for precise attribution of long-term outcomes to individual actions.
- Stable Training: Employs Agent-specific Advantage Normalization to stabilize the training process by calibrating advantage estimates.
- Generalization to Reasoning: Demonstrates notable generalization, yielding performance improvements on reasoning benchmarks when integrated into leading multi-agent systems (MASs).
Good For
- Research in Multi-Agent Reinforcement Learning: Particularly for understanding and developing strategic LLMs in game theory contexts.
- Strategic Game AI Development: Ideal for applications requiring agents capable of complex decision-making in competitive, imperfect-information environments.
- Enhancing Multi-Agent Systems: Can be integrated into MASs to boost performance on reasoning tasks, showing gains of up to +10.0% on AIME and +7.6% on GPQA-Diamond.