Name: moka-ai/m3e-large API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: moka-ai

M3E-Large: Moka Massive Mixed Embedding Model

m3e-large is a 340 million parameter text embedding model developed by MokaAI, part of the M3E (Moka Massive Mixed Embedding) series. It is specifically designed to transform natural language into dense vectors, supporting both Chinese and English.

Key Capabilities

Massive Training Data: Trained on over 22 million Chinese sentence pairs, covering diverse domains like encyclopedias, finance, medicine, law, news, and academia.
Mixed Language Support: Capable of calculating homogeneous text similarity and performing heterogeneous text retrieval for both Chinese and English.
High Performance: Achieves strong results in text classification and retrieval benchmarks. For instance, m3e-large scored 0.6231 average accuracy in text classification across 6 datasets and 0.7974 ndcg@10 in T2Ranking 1W for retrieval, often surpassing openai-ada-002 in Chinese contexts.
ALL IN ONE: Aims to provide a single model for various applications, including sentence-to-sentence similarity and sentence-to-passage retrieval.
Sentence-Transformers Compatibility: Fully compatible with sentence-transformers, allowing seamless integration into existing projects like Chroma, Guidance, and Semantic Kernel.

Good For

Chinese-centric applications: Ideal for use cases primarily involving Chinese text, with some English content.
Text Similarity: Accurately measures similarity between homogeneous texts.
Text Retrieval: Effective for retrieving relevant information from heterogeneous text sources.
Text Classification: Demonstrates strong performance in classifying Chinese texts across various domains.

Overview

M3E-Large: Moka Massive Mixed Embedding Model

Key Capabilities

Good For

Full Model Card (README)