Name: zhaohq/PureRL-1.5B-v5-06-uccp API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Model Overview

This model, zhaohq/PureRL-1.5B-v5-06-uccp, is a 1.5 billion parameter language model developed by zhaohq. It is a fine-tuned variant of the Qwen/Qwen2.5-Math-1.5B base model, leveraging the Reinforcement Learning from Human Feedback (RLHF) framework TRL for its training.

Key Capabilities

Enhanced Reasoning: The model's training incorporates GRPO (Generalized Reinforcement Learning with Policy Optimization), a method detailed in the DeepSeekMath paper, which is designed to push the limits of mathematical reasoning in open language models.
Fine-tuned Performance: By building upon a math-focused base model and applying advanced RL techniques, it aims to deliver improved performance on tasks requiring logical and analytical processing.
Context Handling: Supports a substantial context length of 32768 tokens, allowing for processing and generating longer, more complex interactions.

Good For

Mathematical Problem Solving: Ideal for applications that involve numerical reasoning, complex calculations, and structured problem-solving.
Logical Deduction: Suitable for tasks requiring the model to follow logical steps and derive conclusions from given information.
Research and Development: Provides a foundation for further experimentation with RL-based fine-tuning methods on mathematical and reasoning-intensive tasks.

Overview

Model Overview

Key Capabilities

Good For

Full Model Card (README)