moka-ai/m3e-large

TEXT GENERATIONConcurrency Cost:1Model Size:0.3BQuant:BF16Ctx Length:32kPublished:Jun 21, 2023Architecture:Transformer0.2K Cold

The m3e-large model by MokaAI is a 0.3 billion parameter text embedding model designed for converting natural language into dense vectors. It supports both Chinese and English for homogeneous text similarity calculations and heterogeneous text retrieval. Trained on over 22 million Chinese sentence pairs, it excels in text classification and retrieval tasks, outperforming openai-ada-002 in several Chinese benchmarks.

Loading preview...

M3E-Large: Moka Massive Mixed Embedding Model

m3e-large is a 340 million parameter text embedding model developed by MokaAI, part of the M3E (Moka Massive Mixed Embedding) series. It is specifically designed to transform natural language into dense vectors, supporting both Chinese and English.

Key Capabilities

  • Massive Training Data: Trained on over 22 million Chinese sentence pairs, covering diverse domains like encyclopedias, finance, medicine, law, news, and academia.
  • Mixed Language Support: Capable of calculating homogeneous text similarity and performing heterogeneous text retrieval for both Chinese and English.
  • High Performance: Achieves strong results in text classification and retrieval benchmarks. For instance, m3e-large scored 0.6231 average accuracy in text classification across 6 datasets and 0.7974 ndcg@10 in T2Ranking 1W for retrieval, often surpassing openai-ada-002 in Chinese contexts.
  • ALL IN ONE: Aims to provide a single model for various applications, including sentence-to-sentence similarity and sentence-to-passage retrieval.
  • Sentence-Transformers Compatibility: Fully compatible with sentence-transformers, allowing seamless integration into existing projects like Chroma, Guidance, and Semantic Kernel.

Good For

  • Chinese-centric applications: Ideal for use cases primarily involving Chinese text, with some English content.
  • Text Similarity: Accurately measures similarity between homogeneous texts.
  • Text Retrieval: Effective for retrieving relevant information from heterogeneous text sources.
  • Text Classification: Demonstrates strong performance in classifying Chinese texts across various domains.