Name: Kazuki1450/Qwen3-1.7B-Base_csum_3_10_rel_1e0_1p0_0p0_1p0_grpo_42_rule API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kazuki1450

Overview

This model, developed by Kazuki1450, is a fine-tuned version of the Qwen3-1.7B-Base model, featuring approximately 2 billion parameters and a 32768 token context length. It was trained using the TRL framework and incorporates the GRPO (Gradient-based Reinforcement Learning with Policy Optimization) method. GRPO is a technique highlighted in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300), suggesting an optimization towards enhanced mathematical and reasoning capabilities.

Key Capabilities

Mathematical Reasoning: Leverages the GRPO training method, indicating a focus on improving mathematical problem-solving and logical inference.
Base Model Foundation: Built upon the Qwen3-1.7B-Base architecture, providing a strong general language understanding foundation.
Extended Context Window: Supports a 32768 token context length, enabling processing of longer inputs and maintaining coherence over extended dialogues or documents.

Good For

Mathematical Tasks: Ideal for applications requiring robust mathematical reasoning, complex calculations, or logical deduction.
Research and Development: Suitable for researchers exploring advanced reinforcement learning techniques like GRPO in language models.
Context-Heavy Applications: Beneficial for use cases where understanding and generating text based on extensive context is crucial.

Overview

Overview

Key Capabilities

Good For

Full Model Card (README)