nomic-ai/CodeRankLLM
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Nov 8, 2024License:mitArchitecture:Transformer0.0K Open Weights Warm

CodeRankLLM is a 7.6 billion parameter language model developed by nomic-ai, specifically fine-tuned for listwise code-reranking. It enhances the quality of retrieved results for various code retrieval tasks when combined with performant code retrievers. This model excels at re-ranking code snippets, improving the relevance and order of search results in code-related applications. It was initialized from Qwen2.5-Coder-7B-Instruct and trained using a language modeling objective.

Loading preview...

CodeRankLLM: An LLM for Listwise Code Reranking

CodeRankLLM is a 7.6 billion parameter language model developed by nomic-ai, specifically fine-tuned for listwise code-reranking. This model significantly enhances the quality of retrieved results for various code retrieval tasks, especially when paired with effective code retrievers like CodeRankEmbed.

Key Capabilities

  • Listwise Code Reranking: Optimizes the ordering of multiple code passages simultaneously, improving the relevance of search results.
  • Enhanced Code Retrieval: Designed to work in conjunction with code retrievers to refine and boost the accuracy of retrieved code snippets.
  • Foundation Model: Initialized from the robust Qwen2.5-Coder-7B-Instruct model.

Training Details

The model's training data for listwise reranking was generated from 50,000 <query, positive, negatives> tuples from the high-quality CoRNStack dataset. To provide ranking supervision, the Qwen-2.5-32B-Instruct LLM was leveraged to generate ranked orderings for each example. The fine-tuning process utilized a language modeling objective to minimize the prediction error of the next token in the sequence.

Good For

  • Improving the relevance and ranking of code search results.
  • Applications requiring precise ordering of code snippets based on a query.
  • Developers and researchers working on code intelligence and retrieval systems.