Name: Kazuki1450/Qwen3-1.7B-Base_csum_6_10_tok_result_1p0_0p0_1p0_grpo_1_rule API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kazuki1450

Model Overview

This model, developed by Kazuki1450, is a fine-tuned version of the Qwen3-1.7B-Base architecture. It leverages the Qwen3-1.7B-Base as its foundation and has been specifically trained using the GRPO (Gradient-based Reasoning Policy Optimization) method. GRPO is a technique introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300), indicating a strong focus on improving mathematical reasoning.

Key Capabilities

Enhanced Mathematical Reasoning: The primary differentiator of this model is its training with the GRPO method, which is designed to significantly improve performance on mathematical tasks.
Base Model: Built upon the Qwen3-1.7B-Base, it inherits the general language understanding and generation capabilities of the Qwen family.
Fine-tuned with TRL: The model was fine-tuned using the TRL (Transformer Reinforcement Learning) library, a framework for applying reinforcement learning to transformer models.

Good For

Mathematical Problem Solving: Ideal for applications requiring robust mathematical reasoning, such as solving equations, understanding mathematical concepts, or generating mathematical explanations.
Research and Development: Useful for researchers exploring the impact of GRPO on language models and mathematical capabilities.
Applications requiring Qwen3-1.7B-Base with improved math skills: Suitable for use cases where the base Qwen model's general abilities are needed, but with an added specialization in mathematics.

Overview

Model Overview

Key Capabilities

Good For

Full Model Card (README)