Name: Kazuki1450/Qwen3-1.7B-Base_csum_6_10_final_1p0_0p0_1p0_grpo_42_rule API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kazuki1450

Model Overview

This model, Kazuki1450/Qwen3-1.7B-Base_csum_6_10_final_1p0_0p0_1p0_grpo_42_rule, is a fine-tuned variant of the Qwen3-1.7B-Base architecture, featuring approximately 2 billion parameters and a substantial context length of 40960 tokens. It was developed by Kazuki1450 and built upon the original Qwen model.

Key Differentiator: GRPO Training

The primary distinction of this model lies in its training methodology. It has been fine-tuned using GRPO (Gradient Regularized Policy Optimization), a technique introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This specialized training aims to significantly enhance the model's capabilities in mathematical reasoning and complex problem-solving.

Technical Details

Base Model: Qwen/Qwen3-1.7B-Base
Training Framework: TRL (Transformer Reinforcement Learning) version 0.23.0
Parameter Count: Approximately 2 billion
Context Length: 40960 tokens

Potential Use Cases

Given its GRPO-enhanced training, this model is particularly well-suited for applications requiring:

Mathematical problem-solving: From basic arithmetic to more complex algebraic or calculus-based queries.
Logical reasoning tasks: Where structured thought processes and deduction are critical.
Scientific computing assistance: Generating or interpreting mathematical expressions and concepts.

Developers can quickly integrate and experiment with this model using the provided transformers pipeline example for text generation tasks.

Overview

Model Overview

Key Differentiator: GRPO Training

Technical Details

Potential Use Cases

Full Model Card (README)