Name: Kazuki1450/Qwen3-1.7B-Base_csum_6_10_tok_aligned_1p0_0p0_1p0_grpo_42_rule API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kazuki1450

Model Overview

This model, developed by Kazuki1450, is a fine-tuned variant of the Qwen3-1.7B-Base architecture, featuring approximately 2 billion parameters and a substantial context length of 40960 tokens. Its development leveraged the TRL framework for training.

Key Capabilities

Mathematical Reasoning: The model's training incorporates the GRPO (Gradient-based Reward Policy Optimization) method, a technique highlighted in the "DeepSeekMath" paper for pushing the limits of mathematical reasoning in open language models. This suggests an enhanced ability to process and solve complex mathematical problems.
Base Model Enhancement: It builds upon the foundational capabilities of the Qwen3-1.7B-Base model, implying a strong general language understanding and generation base, now specialized for mathematical tasks.

Training Details

The model was trained using the TRL (Transformer Reinforcement Learning) framework, with specific versions of TRL (0.23.0), Transformers (4.57.1), PyTorch (2.7.1+cu128), Datasets (4.4.1), and Tokenizers (0.22.1) utilized during its development. The application of GRPO is a central aspect of its fine-tuning process, aiming to improve performance in areas related to mathematical and logical problem-solving.

Overview

Model Overview

Key Capabilities

Training Details

Full Model Card (README)