nics-efc/MARSHAL-Kuhn-Poker-Qwen3-4B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Nov 28, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

nics-efc/MARSHAL-Kuhn-Poker-Qwen3-4B is a 4 billion parameter model, initialized from Qwen3-4B, specifically trained as a Kuhn Poker specialist within the MARSHAL framework. Developed by Huining Yuan et al. from the MARSHAL project, this model leverages self-play with a turn-level advantage estimator and agent-specific advantage normalization for fine-grained credit assignment in multi-agent, multi-turn strategic games. It excels in competitive imperfect-information games like Kuhn Poker and demonstrates generalization capabilities, improving performance on reasoning benchmarks when integrated into multi-agent systems.

Loading preview...

MARSHAL: Kuhn Poker Specialist

This model, nics-efc/MARSHAL-Kuhn-Poker-Qwen3-4B, is a 4 billion parameter variant of Qwen3-4B, specifically fine-tuned as a Kuhn Poker specialist within the innovative MARSHAL framework. MARSHAL is an end-to-end reinforcement learning framework designed to enhance multi-agent reasoning through self-play in various competitive and cooperative games. It addresses complex credit assignment challenges in multi-agent, multi-turn scenarios.

Key Capabilities

  • Specialized Game Play: Expert performance in Kuhn Poker, a competitive imperfect-information game.
  • Advanced Credit Assignment: Utilizes a Turn-level Advantage Estimator for precise attribution of long-term outcomes to individual actions.
  • Stable Training: Employs Agent-specific Advantage Normalization to stabilize the training process by calibrating advantage estimates.
  • Generalization to Reasoning: Demonstrates notable generalization, yielding performance improvements on reasoning benchmarks when integrated into leading multi-agent systems (MASs).

Good For

  • Research in Multi-Agent Reinforcement Learning: Particularly for understanding and developing strategic LLMs in game theory contexts.
  • Strategic Game AI Development: Ideal for applications requiring agents capable of complex decision-making in competitive, imperfect-information environments.
  • Enhancing Multi-Agent Systems: Can be integrated into MASs to boost performance on reasoning tasks, showing gains of up to +10.0% on AIME and +7.6% on GPQA-Diamond.