BAAI/bge-code-v1
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kLicense:apache-2.0Architecture:Transformer0.1K Open Weights Cold

BAAI/bge-code-v1 is a 1.5 billion parameter LLM-based code embedding model developed by BAAI. It is designed for superior code retrieval, supporting natural language queries in English and Chinese, and 20 programming languages. The model also maintains robust text retrieval capabilities and extensive multilingual support, excelling in languages like English, Chinese, Japanese, and French. It achieves state-of-the-art performance on both CoIR and CodeRAG benchmarks, making it ideal for tasks requiring precise code and text similarity matching.

Loading preview...

BAAI/bge-code-v1: A Generalist Code Embedding Model

BAAI/bge-code-v1 is a 1.5 billion parameter LLM-based embedding model developed by BAAI, specifically engineered for advanced retrieval tasks. This model stands out for its ability to handle code, text, and multilingual queries effectively.

Key Capabilities

  • Superior Code Retrieval: Supports natural language queries in English and Chinese, alongside 20 programming languages, demonstrating exceptional performance in retrieving relevant code snippets.
  • Robust Text Retrieval: Maintains strong capabilities comparable to dedicated text embedding models of similar scale.
  • Extensive Multilingual Support: Excels in retrieval across multiple languages, including English, Chinese, Japanese, and French.

Performance Highlights

BGE-Code-v1 achieves state-of-the-art results on key benchmarks:

  • CoIR Benchmark: Achieves an average score of 81.77, outperforming models like Voyage-Code-003 (78.53) and CodeXEmbed-7B (78.20).
  • CodeRAG Benchmark: Scores an average of 72.8, surpassing SFR (67.0) and Jina-v2-code (65.4), with notable performance in tasks like DS-1000 (40.9) and SWE-bench-Lite (67.4).

Good For

  • Code Search and Retrieval: Ideal for developers needing to find code based on natural language descriptions or code snippets.
  • Text-to-Code Matching: Excellent for tasks like converting natural language questions into SQL queries or retrieving code implementations from textual explanations.
  • Multilingual Applications: Suitable for projects requiring code and text retrieval across diverse language sets.