Name: Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_1p0_0p2_1p0_grpo_sapo_42_rule API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kazuki1450

Model Overview

This model, developed by Kazuki1450, is a fine-tuned version of the Qwen3-1.7B-Base architecture, featuring approximately 2 billion parameters. It has been specifically trained using the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". The training process utilized the TRL (Transformers Reinforcement Learning) framework.

Key Capabilities

Enhanced Mathematical Reasoning: The primary differentiator of this model is its training with GRPO, a method aimed at improving mathematical reasoning in language models.
Fine-tuned Qwen3-1.7B-Base: Builds upon the foundational capabilities of the Qwen3-1.7B-Base model, adapting it for specialized tasks.
TRL Framework: Training was conducted using the TRL library, indicating a focus on reinforcement learning from human feedback or similar policy optimization techniques.

Good For

Mathematical Problem Solving: Ideal for applications requiring the model to understand and solve mathematical problems or perform logical reasoning.
Research and Development: Useful for researchers exploring the impact of GRPO on language model performance, particularly in mathematical domains.
Specialized Language Tasks: Suitable for use cases where a base Qwen model's reasoning capabilities need to be augmented through targeted fine-tuning.

Overview

Model Overview

Key Capabilities

Good For

Full Model Card (README)