Conan-embedding-v1: General Text Embedding Model

Conan-embedding-v1 is a 0.3 billion parameter text embedding model developed by the Tencent BAC Group, designed for general-purpose text representation. This model demonstrates competitive performance across a suite of embedding benchmarks, including classification (CLS), clustering, reranking, retrieval, semantic textual similarity (STS), and pair classification (Pair_CLS).

Key Capabilities and Performance

Strong Benchmark Results: Conan-embedding-v1 achieves an average score of 72.62 across various embedding tasks, outperforming models like gte-Qwen2-7B-instruct and xiaobu-embedding-v2 in overall average performance.
Excellent Reranking and Retrieval: The model shows particular strength in reranking (72.76) and retrieval (76.67) tasks, indicating its effectiveness in identifying relevant documents or passages.
Efficient Size: With 0.3 billion parameters, it offers a compact solution for generating high-quality text embeddings.

Training Details

The methodologies and specific training details for Conan-embedding-v1 are elaborated in its associated technical report.

Good For

Information Retrieval Systems: Its strong retrieval performance makes it suitable for search engines, question-answering systems, and document similarity tasks.
Semantic Search: Effective for applications where understanding the semantic meaning of text is crucial for matching queries to relevant content.
Text Classification and Clustering: Provides robust embeddings that can enhance the performance of downstream classification and clustering algorithms.

Overview

Conan-embedding-v1: General Text Embedding Model

Key Capabilities and Performance

Training Details

Good For

Full Model Card (README)