Name: uc-rl/Qwen2.5-3B-UCRL API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: uc-rl

Overview

uc-rl/Qwen2.5-3B-UCRL is a 3.1 billion parameter language model, fine-tuned from the Qwen2.5-3B-Instruct base model. It has been specifically trained on the chansung/verifiable-coding-problems dataset to enhance its capabilities in mathematical reasoning and problem-solving.

Key Capabilities

Enhanced Mathematical Reasoning: The model's training with GRPO (Gradient-based Reward Policy Optimization) from the DeepSeekMath paper focuses on improving its ability to handle complex mathematical and logical tasks.
Verifiable Coding Problem Solving: Fine-tuning on a dataset of verifiable coding problems makes it particularly adept at generating and understanding code-related solutions that can be programmatically checked.
Instruction Following: Inherits strong instruction-following capabilities from its Qwen2.5-3B-Instruct base.
Extended Context: Supports a context length of 32768 tokens, allowing for processing longer prompts and more complex problem descriptions.

Training Details

The model was trained using the TRL (Transformer Reinforcement Learning) framework. The core training methodology, GRPO, is detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models," which aims to improve mathematical reasoning in large language models.

Overview

Overview

Key Capabilities

Training Details

Full Model Card (README)