Name: nomic-ai/CodeRankEmbed API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: nomic-ai

CodeRankEmbed: High-Performance Code Retrieval

CodeRankEmbed is a 137 million parameter bi-encoder model developed by nomic-ai, engineered for superior code retrieval performance. It leverages an 8192 token context length, making it suitable for processing longer code snippets and complex queries. The model is initialized with Arctic-Embed-M-Long and contrastively fine-tuned using InfoNCE loss on the 21 million example CoRNStack dataset.

Key Capabilities

State-of-the-art Code Embedding: Achieves an MRR of 77.9 on CSN and NDCG@10 of 60.1 on CoIR, outperforming models like CodeSage, Jina-Code-v2, CodeT5+, OpenAI-Ada-002, and Voyage-Code-002.
Efficient Code Search: Designed to generate embeddings for both queries and code, enabling effective semantic search for relevant code.
Long Context Support: Benefits from an 8192 token context window, allowing for comprehensive understanding of code and queries.
Integration with Re-rankers: Can be combined with re-rankers like CodeRankLLM for enhanced retrieval quality.

Usage Notes

Queries must include the task instruction prefix: "Represent this query for searching relevant code".

Good For

Developers building intelligent code search engines.
Systems requiring high-accuracy code retrieval from large repositories.
Applications needing to find relevant code snippets based on natural language queries.

Overview

CodeRankEmbed: High-Performance Code Retrieval

Key Capabilities

Usage Notes

Good For

Full Model Card (README)