Name: Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_1p0_0p5_1p0_grpo_dr_grpo_42_rule API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kazuki1450

Overview

This model, Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_1p0_0p5_1p0_grpo_dr_grpo_42_rule, is a fine-tuned variant of the Qwen3-1.7B-Base architecture, developed by Kazuki1450. It has been specifically trained using the GRPO (Gradient-based Reasoning Policy Optimization) method, a technique introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300).

Key Capabilities

Enhanced Mathematical Reasoning: The primary differentiator of this model is its fine-tuning with GRPO, which aims to improve its ability to handle complex mathematical and logical reasoning tasks.
Base Model Architecture: Built upon the Qwen3-1.7B-Base, it inherits the foundational language understanding and generation capabilities of the Qwen family.
TRL Framework: Training was conducted using the TRL (Transformers Reinforcement Learning) library, indicating a focus on reinforcement learning from human feedback or similar optimization strategies.

Training Details

The model's training procedure involved the GRPO method, which is designed to push the boundaries of mathematical reasoning in language models. The training utilized specific versions of frameworks including TRL (0.29.0), Transformers (4.57.3), Pytorch (2.9.0), Datasets (4.0.0), and Tokenizers (0.22.1).

Good For

Applications requiring improved mathematical problem-solving.
Tasks that benefit from enhanced logical reasoning.
Developers looking for a compact model (2B parameters) with specialized reasoning capabilities.

Overview

Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)