Name: thangvip/qwen3-1.7b-grpo-sft-base API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: thangvip

Overview

thangvip/qwen3-1.7b-grpo-sft-base is a 1.7 billion parameter language model, fine-tuned by thangvip. It builds upon the base model thangvip/qwen3-1.7b-base-sft-math-1500 and incorporates the GRPO (Gradient-based Reward Policy Optimization) training method. This method, detailed in the DeepSeekMath paper, is specifically designed to push the limits of mathematical reasoning in language models.

Key Capabilities

Enhanced Mathematical Reasoning: Optimized through GRPO for superior performance on mathematical tasks.
Fine-tuned from a Math-focused Base: Benefits from its origin as a math-specialized SFT model.
TRL Framework: Trained using the Transformers Reinforcement Learning (TRL) library.

Good for

Mathematical Problem Solving: Ideal for applications requiring accurate and robust mathematical reasoning.
Research in RLHF for Math: Useful for exploring and building upon GRPO-based training methodologies.
Developing Math-centric AI Assistants: Suitable as a foundation for agents focused on numerical and logical challenges.

Overview

Overview

Key Capabilities

Good for

Full Model Card (README)