Name: Kazuki1450/Qwen3-1.7B-Base_csum_6_10_tok_State_1p0_0p0_1p0_grpo_42_rule API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kazuki1450

Overview

This model, Kazuki1450/Qwen3-1.7B-Base_csum_6_10_tok_State_1p0_0p0_1p0_grpo_42_rule, is a fine-tuned version of the 2 billion parameter Qwen/Qwen3-1.7B-Base model. It leverages a substantial context length of 40960 tokens, making it capable of processing extensive inputs.

Key Training Details

The model was trained using the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This training approach suggests an optimization for tasks that benefit from enhanced mathematical and logical reasoning.

Frameworks Used

Training was conducted using the TRL library, with specific versions including:

TRL: 0.23.0
Transformers: 4.57.1
Pytorch: 2.7.1+cu128
Datasets: 4.4.1
Tokenizers: 0.22.1

Potential Use Cases

Given its fine-tuning with the GRPO method, this model is likely well-suited for applications requiring:

Mathematical problem-solving
Logical reasoning tasks
Complex question answering where numerical or logical deduction is critical

Developers can quickly get started with text generation using the provided transformers pipeline example.

Overview

Overview

Key Training Details

Frameworks Used

Potential Use Cases

Full Model Card (README)