Name: Kazuki1450/Qwen3-1.7B-Base_csum_6_10_assistant_1p0_0p0_1p0_grpo_42_rule API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kazuki1450

Overview

This model, developed by Kazuki1450, is a fine-tuned version of the Qwen3-1.7B-Base architecture, featuring approximately 2 billion parameters. It leverages the Transformer Reinforcement Learning (TRL) framework for its training process.

Key Capabilities

Enhanced Mathematical Reasoning: The model was specifically trained using the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the "DeepSeekMath" paper, indicating a focus on improving mathematical problem-solving and reasoning.
Assistant-like Interactions: The fine-tuning suggests an optimization for conversational or assistant-style applications, capable of generating coherent and relevant responses to user queries.
Base Model Adaptability: Built upon Qwen3-1.7B-Base, it inherits the foundational language understanding and generation capabilities of the Qwen family.

Training Details

The training procedure utilized TRL version 0.23.0, with Transformers 4.57.1 and PyTorch 2.7.1+cu128. The application of the GRPO method is a key differentiator, aiming to push the model's performance in complex reasoning tasks, particularly those involving mathematical concepts.

Good for

Applications requiring a compact language model with improved mathematical reasoning.
Developing AI assistants that need to handle structured queries or problem-solving scenarios.
Research into the effectiveness of GRPO for fine-tuning base models on specific cognitive tasks.

Overview

Overview

Key Capabilities

Training Details

Good for

Full Model Card (README)