jordanpainter/qwen_grpo_100
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Mar 16, 2026Architecture:Transformer Cold

jordanpainter/qwen_grpo_100 is an 8 billion parameter language model, fine-tuned from srirag/sft-qwen-all using the GRPO (Gradient-based Reinforcement Learning with Policy Optimization) method. This model specializes in mathematical reasoning, leveraging techniques introduced in the DeepSeekMath paper. It is designed for tasks requiring advanced logical and mathematical problem-solving capabilities, offering a 32768 token context length.

Loading preview...