Name: buddhist-nlp/gemma-2-mitra-e API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: buddhist-nlp

Overview

The gemma-2-mitra-embedding model, developed by buddhist-nlp, is a specialized multilingual sentence embedding model built upon the Gemma 2 architecture. It functions as an encoder, transforming input text into L2-normalized embeddings, primarily for semantic similarity and retrieval tasks. A key feature is its design for asymmetric inputs, requiring a specific instruction-based format for queries and raw text for corpus passages.

Key Capabilities

Multilingual Semantic Similarity: Excels at comparing sentences across languages including Sanskrit, Tibetan, Pali, Chinese, and English.
Retrieval Systems: Optimized for nearest-neighbor search by encoding queries with an <instruct> template and corpus passages as raw text.
Cross-Lingual Alignment: Specifically used within the Mitra alignment stack for sentence-level alignment of Buddhist texts.
Instruction-Aware Embeddings: Utilizes special tokens (<instruct>, <query>) to generate context-specific embeddings for queries.

Good For

Buddhist NLP Research: Ideal for tasks involving ancient and modern Buddhist texts in various languages.
Multilingual RAG/Search: Applications requiring robust multilingual, instruction-aware query/corpus embeddings.
Custom Alignment Pipelines: Integration into systems like Bertalign for enhanced sentence alignment.
Any application needing L2-normalized sentence vectors for semantic comparison in the supported languages.

Overview

Overview

Key Capabilities

Good For

Full Model Card (README)