IEITYuan/Yuan-embedding-2.0-zh

TEXT GENERATIONConcurrency Cost:1Model Size:0.3BQuant:BF16Ctx Length:32kPublished:Nov 24, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Yuan-embedding-2.0-zh is a 0.3 billion parameter embedding model developed by IEITYuan, specifically optimized for Chinese text retrieval and reranking tasks. Building upon Yuan-embedding-1.0, it incorporates advanced data augmentation techniques like hard negative sampling and LLM-synthesized data using Yuan2-M32. The model utilizes a multi-task loss function, Matryoshka Representation Learning, InfoNCE with in-batch negatives for retrieval, and a Margin-Adaptive Pairwise Ranking Loss for reranking, making it highly effective for semantic search and document ranking in Chinese.

Loading preview...

Yuan-embedding-2.0-zh Overview

Yuan-embedding-2.0-zh is a 0.3 billion parameter embedding model from IEITYuan, specifically engineered for Chinese text retrieval and reranking tasks. It represents an optimized iteration of its predecessor, Yuan-embedding-1.0, with significant enhancements in data processing and loss function design.

Key Capabilities

  • Specialized for Chinese Text: Designed from the ground up for high performance in Chinese language contexts.
  • Enhanced Retrieval: Utilizes InfoNCE with in-batch negatives for robust retrieval performance.
  • Optimized Reranking: Incorporates a Margin-Adaptive Pairwise Ranking Loss to improve the accuracy of document reranking.
  • Advanced Data Augmentation: Employs hard negative sampling, leveraging both Rerank models and LLMs for high-quality sample selection, and LLM-synthesized data using Yuan2-M32 for query rewriting.
  • Multi-Task Learning: Benefits from a multi-task loss function and Matryoshka Representation Learning for comprehensive embedding capabilities.

Good For

  • Semantic Search Systems: Ideal for building or enhancing search engines that require understanding the semantic meaning of Chinese queries and documents.
  • Information Retrieval: Excellent for tasks involving finding relevant Chinese documents from large corpora.
  • Document Ranking: Highly effective for reordering search results or recommendations to present the most relevant Chinese content first.
  • Chinese NLP Applications: Any application requiring high-quality, context-aware embeddings for Chinese text.