Name: sdadas/stella-pl API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: sdadas

Stella-PL: Bilingual Polish-English Text Encoder

Stella-PL is a 1.5 billion parameter text encoder developed by sdadas, building upon the stella_en_1.5B_v5 architecture. It has been adapted for Polish through a multilingual knowledge distillation method, utilizing a corpus of 20 million Polish-English text pairs. The model generates 1024-dimensional embeddings for both languages, allowing for semantic comparisons within and across Polish and English texts.

Key Capabilities

Bilingual Encoding: Processes both Polish and English texts into a unified embedding space.
Cross-Lingual Semantic Search: Enables retrieval and similarity comparisons between Polish and English content.
High Performance: Achieves a NDCG@10 of 60.52 on the Polish Information Retrieval Benchmark (PIRB).
Optimized for Retrieval: Uses specific instruction prefixes for retrieval and semantic similarity tasks, similar to the original Stella model.
Efficient Processing: Supports Flash Attention 2 for faster inference.

Good For

Information Retrieval: Ideal for search applications requiring relevant passage retrieval in Polish or English, or cross-lingual search.
Semantic Similarity: Suitable for tasks like identifying semantically similar sentences or documents in either language.
Cross-Lingual Applications: Developing systems that need to understand and compare meaning across Polish and English.

Overview

Stella-PL: Bilingual Polish-English Text Encoder

Key Capabilities

Good For

Full Model Card (README)