Overview
Overview
SIRI (Scaling Iterative Reinforcement Learning with Interleaved Compression) is a framework developed by THU-KEG to improve the efficiency and accuracy of Large Reasoning Models (LRMs). Traditional reinforcement learning (RL) often leads to "overthinking" and redundant reasoning traces. While prior methods compress outputs to improve efficiency, they typically sacrifice accuracy.
SIRI addresses this trade-off by iteratively alternating between compression and expansion of the reasoning budget, managed by a cosine length scheduler. This dynamic approach allows the model to balance concise, high-density reasoning with longer, exploratory planning, ultimately improving accuracy while reducing average token usage.
Key Capabilities
- Interleaved CompressionâExpansion: Dynamically switches between phases that force concise reasoning and phases that allow for broader exploration.
- Token Efficiency with Accuracy Gains: Uniquely improves reasoning accuracy while simultaneously reducing the average number of tokens used, unlike previous compression methods.
- Iterative RL Training: Built upon GRPO with modifications from DAPO, enhancing the training stability and performance.
- Generalization: The SIRI framework has been validated across different model sizes, including 1.5B and 7B parameters.
Good For
- Applications requiring efficient and accurate reasoning from large language models.
- Scenarios where balancing reasoning depth with computational cost is critical.
- Developers looking for LRMs optimized to avoid redundant thought processes while maintaining high performance.