Name: BAAI/bge-code-v1 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: BAAI

BAAI/bge-code-v1: A Generalist Code Embedding Model

BAAI/bge-code-v1 is a 1.5 billion parameter LLM-based embedding model developed by BAAI, specifically engineered for advanced retrieval tasks. This model stands out for its ability to handle code, text, and multilingual queries effectively.

Key Capabilities

Superior Code Retrieval: Supports natural language queries in English and Chinese, alongside 20 programming languages, demonstrating exceptional performance in retrieving relevant code snippets.
Robust Text Retrieval: Maintains strong capabilities comparable to dedicated text embedding models of similar scale.
Extensive Multilingual Support: Excels in retrieval across multiple languages, including English, Chinese, Japanese, and French.

Performance Highlights

BGE-Code-v1 achieves state-of-the-art results on key benchmarks:

CoIR Benchmark: Achieves an average score of 81.77, outperforming models like Voyage-Code-003 (78.53) and CodeXEmbed-7B (78.20).
CodeRAG Benchmark: Scores an average of 72.8, surpassing SFR (67.0) and Jina-v2-code (65.4), with notable performance in tasks like DS-1000 (40.9) and SWE-bench-Lite (67.4).

Good For

Code Search and Retrieval: Ideal for developers needing to find code based on natural language descriptions or code snippets.
Text-to-Code Matching: Excellent for tasks like converting natural language questions into SQL queries or retrieving code implementations from textual explanations.
Multilingual Applications: Suitable for projects requiring code and text retrieval across diverse language sets.

Overview

BAAI/bge-code-v1: A Generalist Code Embedding Model

Key Capabilities

Performance Highlights

Good For

Full Model Card (README)