Name: jevonmao/llama31-8b-poker-mix-v1-step10k API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: jevonmao

Overview

jevonmao/llama31-8b-poker-mix-v1-step10k, or PokerLlama-4, is an 8 billion parameter supervised fine-tune of Meta's Llama-3.1-8B-Instruct. It specializes in heads-up no-limit Texas hold'em poker at 200 big blinds, distilling mixed strategies from the GTO Wizard equilibrium solver. The model supports both direct action emission and chain-of-thought reasoning, making it suitable for studying GTO-policy distillation into small language models.

Key Capabilities

High Accuracy: Achieves 83.96% top-1 action-type accuracy on a 31,105-decision held-out evaluation split (HoldemEval-31k), a significant improvement over other 8B poker fine-tunes and base Llama-3.1-8B-Instruct.
Specialized Performance: Outperforms frontier reasoning models like DeepSeek-V4-Pro and GPT-4.1/5.4-pro in postflop action prediction, demonstrating the effectiveness of specialized distillation.
Tool-Use Support: Emits well-formed preflop_gto tool calls with 99.9% argument correctness, facilitating tool-use and function-call experiments in a poker domain.
Chain-of-Thought: Capable of generating chain-of-thought reasoning traces, aiding in the study of GTO-policy distillation.

Intended Use Cases

Research Artifact: Primarily intended for reproducing project evaluation results and studying GTO-policy distillation into small language models.
Tool-Use Experiments: Useful for experiments involving tool-use and function-calling within a poker context.

Limitations

Greedy Decoding Bias: Greedy decoding targets the modal action of the solver's mixed strategy, which may not be EV-maximizing against non-equilibrium opponents.
Game Format Scope: Trained exclusively on heads-up no-limit Texas hold'em at 200 BB depth; not evaluated for other stack depths, player counts, or poker variants.
Not for Gambling: Explicitly not intended for real-money gambling or any context where financial harm could result from mis-prediction.

Overview

Overview

Key Capabilities

Intended Use Cases

Limitations

Full Model Card (README)