Name: Kazuki1450/Qwen3-1.7B-Base_csum_3_10_1p0_0p1_1p0_grpo_42_rule API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kazuki1450

Overview

This model, Kazuki1450/Qwen3-1.7B-Base_csum_3_10_1p0_0p1_1p0_grpo_42_rule, is a specialized fine-tuned version of the Qwen3-1.7B-Base architecture. It incorporates the GRPO (Gradient-based Reward Policy Optimization) training method, as detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This approach aims to significantly improve the model's ability to handle complex reasoning tasks.

Key Capabilities

Enhanced Reasoning: The primary focus of this model's fine-tuning is to boost its reasoning capabilities, particularly in mathematical domains, through the application of the GRPO method.
Qwen3-1.7B-Base Foundation: Built upon the robust Qwen3-1.7B-Base model, it inherits a strong base for general language understanding and generation.
Extended Context Window: Features a 32,768 token context length, allowing it to process and generate longer, more coherent texts while maintaining context.

Good For

Mathematical Problem Solving: Ideal for applications requiring advanced mathematical reasoning and problem-solving, benefiting from the GRPO training.
Complex Logical Tasks: Suitable for scenarios where improved logical deduction and structured thinking are crucial.
Research and Development: Provides a foundation for further experimentation and fine-tuning on tasks that demand high-quality reasoning from a 2 billion parameter model.

Overview

Overview

Key Capabilities

Good For

Full Model Card (README)