Name: Kwai-Klear/GoLongRL-4B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kwai-Klear

GoLongRL-4B: Long-Context Reinforcement Learning

GoLongRL-4B is a 4 billion parameter model from Kwai-Klear, specifically designed for long-context reinforcement learning with verifiable rewards (RLVR). This model introduces a novel post-training recipe that significantly enhances performance on tasks requiring extensive context understanding and processing. The framework is fully open-source, including its dataset and training code.

Key Capabilities & Innovations

Capability-Oriented Dataset: Trained on a 23K sample dataset covering 9 distinct long-context task types, such as precise retrieval, numerical reasoning, structured extraction, and summarization. Each task incorporates natural evaluation metrics as reward functions.
TMN-Reweight: A proposed method to address optimization challenges from heterogeneous rewards. It combines task-level mean normalization with difficulty-adaptive weighting, providing consistent improvements over vanilla GRPO.
Strong Long-Context Performance: Achieves an average performance of 63.0 at the 4B scale, outperforming the closed-source QwenLong-L1.5 dataset even with its specialized AEPO algorithm. The model also preserves or improves general capabilities (MMLU-Pro, AIME24/25, GPQA) and shows substantial gains in dialogue memory benchmarks (LongMemEval +13.6).

Good For

Applications requiring deep understanding and reasoning over very long texts.
Research and development in reinforcement learning for language models.
Tasks involving complex information retrieval, structured data extraction, and multi-document summarization.
Developers interested in open-source long-context models and their training methodologies.

Overview

GoLongRL-4B: Long-Context Reinforcement Learning

Key Capabilities & Innovations

Good For

Full Model Card (README)