CodeRankLLM is a 7.6 billion parameter language model developed by nomic-ai, specifically fine-tuned for listwise code-reranking. It enhances the quality of retrieved results for various code retrieval tasks when combined with performant code retrievers. This model excels at re-ranking code snippets, improving the relevance and order of search results in code-related applications. It was initialized from Qwen2.5-Coder-7B-Instruct and trained using a language modeling objective.
Loading preview...
CodeRankLLM: An LLM for Listwise Code Reranking
CodeRankLLM is a 7.6 billion parameter language model developed by nomic-ai, specifically fine-tuned for listwise code-reranking. This model significantly enhances the quality of retrieved results for various code retrieval tasks, especially when paired with effective code retrievers like CodeRankEmbed.
Key Capabilities
- Listwise Code Reranking: Optimizes the ordering of multiple code passages simultaneously, improving the relevance of search results.
- Enhanced Code Retrieval: Designed to work in conjunction with code retrievers to refine and boost the accuracy of retrieved code snippets.
- Foundation Model: Initialized from the robust Qwen2.5-Coder-7B-Instruct model.
Training Details
The model's training data for listwise reranking was generated from 50,000 <query, positive, negatives> tuples from the high-quality CoRNStack dataset. To provide ranking supervision, the Qwen-2.5-32B-Instruct LLM was leveraged to generate ranked orderings for each example. The fine-tuning process utilized a language modeling objective to minimize the prediction error of the next token in the sequence.
Good For
- Improving the relevance and ranking of code search results.
- Applications requiring precise ordering of code snippets based on a query.
- Developers and researchers working on code intelligence and retrieval systems.