Name: swadeshb/Qwen2.5-3B-Instruct-CRPO-V35 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: swadeshb

Model Overview

The swadeshb/Qwen2.5-3B-Instruct-CRPO-V35 is a 3.1 billion parameter instruction-tuned model, developed by swadeshb through fine-tuning the base Qwen/Qwen2.5-3B-Instruct model. This fine-tuning process leveraged the TRL library and incorporated the GRPO (Generalized Reinforcement Learning from Policy Optimization) training method.

Key Capabilities

Instruction Following: Optimized for understanding and responding to user instructions, building on the Qwen2.5-3B-Instruct foundation.
GRPO Training: Utilizes a training methodology described in the DeepSeekMath paper, which focuses on pushing the limits of mathematical reasoning in open language models. This suggests potential strengths in structured reasoning tasks.
Efficient Inference: As a 3.1 billion parameter model, it offers a balance between capability and computational efficiency, suitable for various deployment scenarios.

Good For

Applications requiring a compact yet capable instruction-following model.
Tasks that could benefit from improved reasoning, particularly in areas where GRPO has shown advantages.
Developers looking to integrate a fine-tuned Qwen2.5 variant with specific training enhancements.

Overview

Model Overview

Key Capabilities

Good For

Full Model Card (README)