TencentBAC/Conan-embedding-v1

TEXT GENERATIONConcurrency Cost:1Model Size:0.3BQuant:BF16Ctx Length:32kPublished:Aug 22, 2024License:cc-by-nc-4.0Architecture:Transformer0.2K Open Weights Cold

Conan-embedding-v1 is a 0.3 billion parameter general text embedding model developed by Tencent BAC Group. It is designed for various natural language processing tasks, demonstrating strong performance across classification, clustering, reranking, retrieval, and semantic textual similarity benchmarks. The model achieves competitive results, particularly in reranking and retrieval, making it suitable for applications requiring robust text representation.

Loading preview...

Conan-embedding-v1: General Text Embedding Model

Conan-embedding-v1 is a 0.3 billion parameter text embedding model developed by the Tencent BAC Group, designed for general-purpose text representation. This model demonstrates competitive performance across a suite of embedding benchmarks, including classification (CLS), clustering, reranking, retrieval, semantic textual similarity (STS), and pair classification (Pair_CLS).

Key Capabilities and Performance

  • Strong Benchmark Results: Conan-embedding-v1 achieves an average score of 72.62 across various embedding tasks, outperforming models like gte-Qwen2-7B-instruct and xiaobu-embedding-v2 in overall average performance.
  • Excellent Reranking and Retrieval: The model shows particular strength in reranking (72.76) and retrieval (76.67) tasks, indicating its effectiveness in identifying relevant documents or passages.
  • Efficient Size: With 0.3 billion parameters, it offers a compact solution for generating high-quality text embeddings.

Training Details

The methodologies and specific training details for Conan-embedding-v1 are elaborated in its associated technical report.

Good For

  • Information Retrieval Systems: Its strong retrieval performance makes it suitable for search engines, question-answering systems, and document similarity tasks.
  • Semantic Search: Effective for applications where understanding the semantic meaning of text is crucial for matching queries to relevant content.
  • Text Classification and Clustering: Provides robust embeddings that can enhance the performance of downstream classification and clustering algorithms.