Name: zhangchenxu/TinyV-1.5B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhangchenxu

TinyV-1.5B: An Efficient Reward System for RL Post-Training

TinyV-1.5B is a 1.5 billion parameter language model developed by zhangchenxu, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. Its primary function is to serve as an efficient reward system for Reinforcement Learning (RL) post-training. The model addresses a critical challenge in RL by detecting false negatives in existing rule-based verifiers, thereby providing more precise and effective reward signals during the RL training process.

Key Capabilities

Enhanced RL Efficiency: TinyV significantly boosts the efficiency of RL training.
Improved Model Performance: It leads to a notable increase in the final performance of RL models.
False Negative Detection: The model excels at identifying false negatives in rule-based verification systems.
Low Computational Overhead: It achieves these improvements with only a 6% additional computational cost.
Specialized Fine-tuning: The model is fine-tuned on the zhangchenxu/TinyV_Training_Data_Balanced dataset.

Good For

Developers and researchers looking to optimize Reinforcement Learning pipelines.
Applications requiring more accurate reward signals for RL agents.
Scenarios where computational efficiency is crucial for RL training.

For detailed implementation and usage, refer to the official GitHub repository.