jordanpainter/llama_gspo_200
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Mar 25, 2026Architecture:Transformer Cold

The jordanpainter/llama_gspo_200 is an 8 billion parameter language model fine-tuned from srirag/sft-llama-all. It was trained using the GRPO method, as introduced in the DeepSeekMath paper, which focuses on pushing the limits of mathematical reasoning. This model is optimized for enhanced reasoning capabilities, particularly in areas related to mathematical problem-solving and complex logical tasks.

Loading preview...