Name: Kazuki1450/Qwen3-1.7B-Base_csum_6_10_rel_1e-7_1p0_0p0_1p0_grpo_2_rule API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kazuki1450

Overview

This model, named Kazuki1450/Qwen3-1.7B-Base_csum_6_10_rel_1e-7_1p0_0p0_1p0_grpo_2_rule, is a 2 billion parameter language model derived from the Qwen/Qwen3-1.7B-Base architecture. It has been fine-tuned using the TRL framework.

Key Training Details

Base Model: Qwen/Qwen3-1.7B-Base
Fine-tuning Framework: TRL
Training Method: Incorporates GRPO (Grouped Recurrent Policy Optimization), a method detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This suggests an optimization for reasoning tasks, potentially mathematical.

Capabilities

Text Generation: Capable of generating human-like text based on given prompts.
Reasoning: The application of the GRPO method implies an enhanced focus on reasoning capabilities, particularly in areas where DeepSeekMath has shown strengths.

Usage

Developers can quickly integrate this model using the transformers library for text generation tasks, as demonstrated in the provided quick start example. The model is suitable for applications requiring a compact yet capable language model with improved reasoning characteristics due to its specialized training.

Overview

Overview

Key Training Details

Capabilities

Usage

Full Model Card (README)