Kwai-Klear/GoLongRL-4B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:May 19, 2026License:mitArchitecture:Transformer0.0K Open Weights Warm

Kwai-Klear/GoLongRL-4B is a 4 billion parameter model developed by Kwai-Klear, focusing on long-context reinforcement learning with verifiable rewards (RLVR). It utilizes a capability-oriented dataset and TMN-Reweight for optimizing heterogeneous rewards, achieving strong long-context performance. The model excels in tasks requiring precise retrieval, comprehension, and numerical reasoning over extended contexts, while preserving general capabilities.

Loading preview...

GoLongRL-4B: Long-Context Reinforcement Learning

GoLongRL-4B is a 4 billion parameter model from Kwai-Klear, specifically designed for long-context reinforcement learning with verifiable rewards (RLVR). This model introduces a novel post-training recipe that significantly enhances performance on tasks requiring extensive context understanding and processing. The framework is fully open-source, including its dataset and training code.

Key Capabilities & Innovations

  • Capability-Oriented Dataset: Trained on a 23K sample dataset covering 9 distinct long-context task types, such as precise retrieval, numerical reasoning, structured extraction, and summarization. Each task incorporates natural evaluation metrics as reward functions.
  • TMN-Reweight: A proposed method to address optimization challenges from heterogeneous rewards. It combines task-level mean normalization with difficulty-adaptive weighting, providing consistent improvements over vanilla GRPO.
  • Strong Long-Context Performance: Achieves an average performance of 63.0 at the 4B scale, outperforming the closed-source QwenLong-L1.5 dataset even with its specialized AEPO algorithm. The model also preserves or improves general capabilities (MMLU-Pro, AIME24/25, GPQA) and shows substantial gains in dialogue memory benchmarks (LongMemEval +13.6).

Good For

  • Applications requiring deep understanding and reasoning over very long texts.
  • Research and development in reinforcement learning for language models.
  • Tasks involving complex information retrieval, structured data extraction, and multi-document summarization.
  • Developers interested in open-source long-context models and their training methodologies.