movefast/Qwen2.5-7B-Instruct-GRPO
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kArchitecture:Transformer Cold

The movefast/Qwen2.5-7B-Instruct-GRPO is a 7.6 billion parameter instruction-tuned causal language model, fine-tuned from Qwen/Qwen2.5-7B-Instruct. This model utilizes the GRPO (Gradient-based Reward Policy Optimization) method, specifically optimized for mathematical reasoning tasks. It is particularly suited for applications requiring advanced problem-solving capabilities in mathematics, building upon the robust Qwen2.5 architecture.

Loading preview...