eventhorizon28/cadforge-grpo-Qwen3-1.7B

TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Apr 26, 2026Architecture:Transformer Cold

The eventhorizon28/cadforge-grpo-Qwen3-1.7B model is a 2 billion parameter language model, fine-tuned from Qwen/Qwen3-1.7B using the GRPO method. This model specializes in mathematical reasoning, leveraging techniques introduced in the DeepSeekMath paper. It is designed for tasks requiring robust mathematical problem-solving capabilities, building upon the Qwen3 architecture with a 32768 token context length.

Loading preview...

Model Overview

The eventhorizon28/cadforge-grpo-Qwen3-1.7B is a 2 billion parameter language model, fine-tuned from the base Qwen/Qwen3-1.7B model. This fine-tuning process utilized the TRL library and specifically incorporated the GRPO (Gradient-based Reward Policy Optimization) method.

Key Capabilities and Training

The primary differentiator of this model is its training with GRPO, a technique detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This indicates a strong focus on enhancing the model's ability to handle complex mathematical reasoning tasks. The training framework versions include TRL 1.2.0, Transformers 5.7.0.dev0, Pytorch 2.8.0, Datasets 4.8.4, and Tokenizers 0.22.2.

Use Cases

Given its specialized training with GRPO for mathematical reasoning, this model is particularly well-suited for:

  • Mathematical problem-solving: Tasks that require logical deduction and numerical computation.
  • Scientific and engineering applications: Where precise mathematical understanding is crucial.
  • Educational tools: For generating explanations or solutions to mathematical queries.

Users interested in leveraging a compact yet capable model for mathematical reasoning should consider this fine-tuned Qwen3 variant.