RLVER/PPO-non-thinking
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Jul 4, 2025License:licenseArchitecture:Transformer0.0K Cold

RLVER/PPO-non-thinking is a 7.6 billion parameter model developed by RLVER, featuring a 32768-token context length. This model is specifically designed for tasks requiring direct policy execution without complex reasoning, focusing on efficient, non-cognitive responses. It excels in environments where rapid, pre-trained action selection is paramount, making it suitable for specialized control and decision-making applications.

Loading preview...