Name: Kazuki1450/Qwen3-1.7B-Base_csum_3_10_tok_Continue_1p0_0p0_1p0_grpo_42_rule API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kazuki1450

Model Overview

This model, developed by Kazuki1450, is a fine-tuned variant of the Qwen3-1.7B-Base architecture, featuring approximately 2 billion parameters and a 32,768 token context length. It has been specifically trained using the GRPO (Gradient-based Reinforcement Learning with Policy Optimization) method, as introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300).

Key Capabilities

Enhanced Mathematical Reasoning: The primary differentiator of this model is its fine-tuning with GRPO, which aims to significantly improve its ability to understand and solve complex mathematical problems.
Continued Training: Built upon the robust Qwen3-1.7B-Base, it leverages the foundational capabilities of the Qwen family.
Long Context Window: A 32K token context length allows for processing extensive problem descriptions and generating detailed, multi-step solutions.

Good For

Mathematical Problem Solving: Ideal for applications requiring advanced mathematical reasoning, calculations, and logical deduction.
Research in LLM Training: Useful for researchers exploring the impact of GRPO and similar reinforcement learning techniques on model performance, particularly in specialized domains.
Complex Query Handling: Its long context window makes it suitable for tasks where detailed input and output are necessary, such as explaining mathematical concepts or deriving proofs.

Overview

Model Overview

Key Capabilities

Good For

Full Model Card (README)