Alibaba-NLP/ERank-32B
Alibaba-NLP/ERank-32B is a 32 billion parameter pointwise text reranker developed by Alibaba-NLP, designed for effective and efficient document relevance scoring. This model utilizes a novel two-stage training pipeline combining Supervised Fine-Tuning (SFT) for generative integer score output and Reinforcement Learning (RL) with a listwise derived reward. ERank-32B excels in reasoning-intensive reranking tasks, outperforming many listwise rerankers while maintaining low latency due to its pointwise architecture. It supports custom input instructions and has a context length of 128K tokens.
Loading preview...
ERank-32B: An Effective and Efficient Text Reranker
ERank-32B, developed by Alibaba-NLP, is a 32 billion parameter pointwise reranker built from a reasoning LLM, designed for high effectiveness and low latency across diverse relevance scenarios. It notably outperforms recent listwise rerankers on challenging reasoning-intensive tasks.
Key Capabilities and Training
- Novel Two-Stage Training: ERank is trained using a unique pipeline:
- Supervised Fine-Tuning (SFT): Unlike traditional rerankers, ERank is trained to generatively output fine-grained integer scores for relevance.
- Reinforcement Learning (RL): Incorporates a novel listwise derived reward to instill global ranking awareness into its efficient pointwise architecture.
- Instruction Awareness: The model supports customizing input instructions for different tasks.
- High Context Length: Features a substantial 128K token sequence length.
Performance and Efficiency
ERank-32B demonstrates strong performance across various benchmarks:
- Reasoning-Intensive Tasks: Achieves an average score of 38.1 across BRIGHT, FollowIR, BEIR, and TREC DL benchmarks, with 24.4 on BRIGHT and 12.1 on FollowIR, surpassing other pointwise and many listwise methods.
- State-of-the-Art on BRIGHT: When combined with BM25 hybrid, ERank-32B achieves a state-of-the-art nDCG@10 of 40.2 on the challenging BRIGHT benchmark.
- Low Latency: As a pointwise reranker, ERank offers significantly lower latency compared to listwise models, making it highly efficient for real-time applications.
Usage
ERank-32B can be easily integrated using Transformer or vLLM for inference, allowing users to rerank documents based on a query and instruction, with optional hybrid scoring with first-stage retriever scores.