Name: IEITYuan/Yuan-embedding-2.0-zh API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: IEITYuan

Yuan-embedding-2.0-zh Overview

Yuan-embedding-2.0-zh is a 0.3 billion parameter embedding model from IEITYuan, specifically engineered for Chinese text retrieval and reranking tasks. It represents an optimized iteration of its predecessor, Yuan-embedding-1.0, with significant enhancements in data processing and loss function design.

Key Capabilities

Specialized for Chinese Text: Designed from the ground up for high performance in Chinese language contexts.
Enhanced Retrieval: Utilizes InfoNCE with in-batch negatives for robust retrieval performance.
Optimized Reranking: Incorporates a Margin-Adaptive Pairwise Ranking Loss to improve the accuracy of document reranking.
Advanced Data Augmentation: Employs hard negative sampling, leveraging both Rerank models and LLMs for high-quality sample selection, and LLM-synthesized data using Yuan2-M32 for query rewriting.
Multi-Task Learning: Benefits from a multi-task loss function and Matryoshka Representation Learning for comprehensive embedding capabilities.

Good For

Semantic Search Systems: Ideal for building or enhancing search engines that require understanding the semantic meaning of Chinese queries and documents.
Information Retrieval: Excellent for tasks involving finding relevant Chinese documents from large corpora.
Document Ranking: Highly effective for reordering search results or recommendations to present the most relevant Chinese content first.
Chinese NLP Applications: Any application requiring high-quality, context-aware embeddings for Chinese text.

Overview

Yuan-embedding-2.0-zh Overview

Key Capabilities

Good For

Full Model Card (README)