Name: Kazuki1450/Qwen3-1.7B-Base_csum_3_10_tok_dollars_1p0_0p0_1p0_grpo_42_rule API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kazuki1450

Model Overview

This model, Kazuki1450/Qwen3-1.7B-Base_csum_3_10_tok_dollars_1p0_0p0_1p0_grpo_42_rule, is a fine-tuned variant of the Qwen3-1.7B-Base architecture, featuring approximately 2 billion parameters and a substantial 32768 token context window. It was developed by Kazuki1450 and fine-tuned using the TRL library.

Key Differentiator: GRPO Fine-tuning

A core aspect of this model is its training methodology, which incorporates GRPO (Gradient-based Reward Policy Optimization). This technique, detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models," is designed to significantly enhance a model's mathematical reasoning abilities. By applying GRPO, this Qwen3-based model aims to improve performance on tasks that demand robust logical and mathematical processing.

Training Details

The model was trained using TRL (Transformers Reinforcement Learning) and leverages framework versions including TRL 0.29.0, Transformers 4.57.3, Pytorch 2.9.0, Datasets 4.0.0, and Tokenizers 0.22.1. The training process is publicly viewable via Weights & Biases.

Use Cases

Given its GRPO-enhanced training, this model is particularly well-suited for applications requiring:

Mathematical problem-solving
Logical reasoning tasks
Complex analytical queries

Developers can quickly integrate the model using the provided transformers pipeline for text generation tasks.

Overview

Model Overview

Key Differentiator: GRPO Fine-tuning

Training Details

Use Cases

Full Model Card (README)