HuggingFaceAlbert/Qwen3-1.7B-grpo-1765505298
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kArchitecture:Transformer Warm

HuggingFaceAlbert/Qwen3-1.7B-grpo-1765505298 is a 2 billion parameter language model fine-tuned from an unspecified base Qwen3 model. It utilizes the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the DeepSeekMath paper, to enhance its capabilities. This model is specifically optimized for tasks requiring advanced mathematical reasoning and complex problem-solving, making it suitable for applications in scientific computing and quantitative analysis.

Loading preview...