rajveer43/supply-chain-grpo-Qwen3-1.7B
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Mar 28, 2026Architecture:Transformer Warm

The rajveer43/supply-chain-grpo-Qwen3-1.7B model is a 2 billion parameter language model, fine-tuned from Qwen/Qwen3-1.7B. It utilizes the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the DeepSeekMath paper, for its training. This model is specialized for tasks that benefit from advanced reasoning capabilities, particularly in mathematical contexts, leveraging its 32768 token context length. Its fine-tuning with GRPO suggests an optimization for improved performance in complex problem-solving scenarios.

Loading preview...