Name: LichengLiu03/Qwen2.5-3B-UFO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: LichengLiu03

Overview of Qwen2.5-3B-UFO

LichengLiu03/Qwen2.5-3B-UFO is a 3.1 billion parameter model built upon the Qwen2.5-3B-Instruct architecture. Its core innovation lies in the Unary Feedback as Observation (UFO) framework, which addresses the challenge of multi-turn reasoning in LLMs. Traditional single-turn reinforcement learning models often fail to incorporate feedback effectively, repeating errors in interactive scenarios.

Key Capabilities & Differentiators

Multi-Turn Reasoning: The UFO framework transforms static datasets into multi-turn training by treating minimal "Try Again" feedback as part of the observation, enabling the model to learn from historical mistakes and revise its reasoning iteratively.
Enhanced Mathematical Performance: Trained with PPO on the MetaMathQA dataset, it shows a 14% improvement in multi-turn success rates and a 10% reduction in average interaction turns for mathematical problems compared to single-turn baselines.
Answer Diversity: The model achieves 90% non-repetitive answers, significantly higher than the 80% baseline, due to a repetition penalty in its reward design.
Efficient Problem Solving: An exponential reward decay mechanism encourages solving problems in fewer turns, leading to more efficient reasoning.

Good For

Mathematical Reasoning: Optimized for complex math problems, logical reasoning, and accurate calculation steps.
Interactive Problem Solving: Ideal for applications where models need to iteratively refine answers based on simple negative feedback.
Learning from Sparse Feedback: Demonstrates effectiveness in scenarios where only minimal "Try Again" signals are available for improvement.