Name: Kazuki1450/Qwen3-1.7B-Base_csum_6_10_tok_assistant_1p0_0p0_1p0_grpo_42_rule API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kazuki1450

Model Overview

This model, Kazuki1450/Qwen3-1.7B-Base_csum_6_10_tok_assistant_1p0_0p0_1p0_grpo_42_rule, is a fine-tuned variant of the Qwen3-1.7B-Base architecture, featuring 2 billion parameters and a substantial context length of 40960 tokens. It was developed by Kazuki1450 and trained using the TRL framework.

Key Capabilities

Enhanced Reasoning: The model incorporates the GRPO (Gradient-based Reasoning Policy Optimization) method, as introduced in the DeepSeekMath paper, to improve its mathematical and general reasoning abilities.
Assistant-like Interactions: Fine-tuned for conversational and assistant roles, making it suitable for generating helpful and coherent responses to user queries.
Long Context Handling: With a 40960-token context window, it can process and understand extensive inputs, maintaining coherence over longer dialogues or documents.

Training Details

The model's training procedure utilized the TRL (Transformer Reinforcement Learning) framework. The application of the GRPO method, detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models," is a core aspect of its fine-tuning, aiming to bolster its logical and mathematical problem-solving skills.

Overview

Model Overview

Key Capabilities

Training Details

Full Model Card (README)