Name: jordanpainter/qwen_grpo_100 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: jordanpainter

Model Overview

jordanpainter/qwen_grpo_100 is an 8 billion parameter language model, fine-tuned from the srirag/sft-qwen-all base model. This fine-tuning was performed using the TRL library and specifically employed the GRPO (Gradient-based Reinforcement Learning with Policy Optimization) method.

Key Capabilities

Mathematical Reasoning: The model's training with GRPO is inspired by the methodology presented in the DeepSeekMath paper, indicating a strong focus on enhancing mathematical problem-solving abilities.
Instruction Following: As a fine-tuned model, it is designed to follow user instructions effectively, as demonstrated by the quick start example.
Extended Context: Supports a context length of 32768 tokens, allowing for processing longer inputs and maintaining coherence over extended conversations or documents.

Training Details

The model's training process can be visualized via Weights & Biases, providing insights into its development. The GRPO method, as detailed in the DeepSeekMath research, aims to push the limits of mathematical reasoning in open language models.

Good For

Applications requiring strong mathematical and logical reasoning.
Tasks benefiting from advanced instruction following and extended context understanding.
Research and development in reinforcement learning for language models, particularly those focused on mathematical domains.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)