Name: Kazuki1450/Qwen3-1.7B-Base_csum_3_10_tok_English_1p0_0p0_1p0_grpo_42_rule API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kazuki1450

Model Overview

This model, developed by Kazuki1450, is a fine-tuned variant of the Qwen3-1.7B-Base model, featuring 2 billion parameters and a substantial 32768 token context length. It was trained using the TRL framework.

Key Differentiator: GRPO Training

The primary distinction of this model lies in its training methodology. It utilizes GRPO (Gradient-based Reasoning Policy Optimization), a method detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This suggests an optimization for tasks that benefit from enhanced reasoning, particularly in mathematical contexts.

Technical Details

Base Model: Qwen/Qwen3-1.7B-Base
Training Framework: TRL (Transformers Reinforcement Learning)
Parameter Count: Approximately 2 billion
Context Length: 32768 tokens

Use Cases

Given its GRPO training, this model is likely suitable for applications requiring:

Improved reasoning capabilities, especially in structured or logical problem-solving.
Tasks that benefit from the principles outlined in the DeepSeekMath paper, potentially including mathematical reasoning or complex logical deductions.

Quick Start Example

Users can quickly integrate the model using the transformers pipeline for text generation, as demonstrated in the provided example, to explore its reasoning capabilities.

Overview

Model Overview

Key Differentiator: GRPO Training

Technical Details

Use Cases

Quick Start Example

Full Model Card (README)