Name: Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_0p5_0p0_1p0_grpo_42_rule API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kazuki1450

Model Overview

This model, Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_0p5_0p0_1p0_grpo_42_rule, is a fine-tuned variant of the Qwen3-1.7B-Base architecture, developed by Kazuki1450. It features approximately 2 billion parameters and supports a substantial context length of 32768 tokens.

Key Capabilities

Enhanced Mathematical Reasoning: The model was specifically trained using the GRPO (Gradient-based Regularization for Policy Optimization) method, as introduced in the DeepSeekMath research paper. This training approach aims to significantly improve its performance on mathematical reasoning tasks.
Fine-tuned with TRL: The fine-tuning process utilized the TRL (Transformers Reinforcement Learning) framework, suggesting a focus on optimizing model behavior through reinforcement learning techniques.
Base Model: Built upon the robust Qwen3-1.7B-Base, providing a strong foundation for general language understanding and generation.

Good For

Mathematical Problem Solving: Due to its GRPO-based training, this model is particularly well-suited for applications requiring accurate mathematical reasoning and problem-solving.
Research and Development: Developers and researchers interested in exploring the impact of GRPO on smaller language models for specific tasks, especially in the domain of mathematics.
Applications requiring a balance of size and capability: Its 2 billion parameter count makes it more efficient than larger models while still offering specialized capabilities in mathematical reasoning.

Overview

Model Overview

Key Capabilities

Good For

Full Model Card (README)