yangerine/grpo-baseline-lr1e5-l1
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Mar 31, 2026Architecture:Transformer0.0K Loading

The yangerine/grpo-baseline-lr1e5-l1 model is a 4 billion parameter language model fine-tuned from Qwen/Qwen3-4B. It was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities in large language models. This model is optimized for tasks requiring advanced mathematical problem-solving and logical deduction. Its primary strength lies in its ability to process and generate responses for complex mathematical and reasoning-based queries.

Loading preview...