zhangchenxu/TinyV-1.5B
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 13, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm
TinyV-1.5B by zhangchenxu is a 1.5 billion parameter language model, fine-tuned from Qwen2.5-1.5B-Instruct, designed as a reward system for efficient Reinforcement Learning (RL) post-training. It specializes in detecting false negatives in rule-based verifiers to provide more accurate reward signals during RL training. This model significantly increases both RL efficiency and final model performance with only a 6% additional computational cost, making it suitable for optimizing RL workflows.
Loading preview...
Popular Sampler Settings
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.
temperature
–
top_p
–
top_k
–
frequency_penalty
–
presence_penalty
–
repetition_penalty
–
min_p
–