Name: sdadas/stella-pl-retrieval API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: sdadas

Overview

sdadas/stella-pl-retrieval is a 1.5 billion parameter text encoder, a specialized version of the stella_en_1.5B_v5 model, meticulously fine-tuned for Polish information retrieval (IR) tasks. It was developed by sdadas through a two-step process:

Multilingual Adaptation: Initially adapted for Polish using a knowledge distillation method on a diverse corpus of 20 million Polish-English text pairs.
Contrastive Fine-tuning: Further fine-tuned with contrastive loss on a dataset of 1.4 million queries, where positive and negative passages were selected using the BAAI/bge-reranker-v2.5-gemma2-lightweight.

This model encodes texts into 1024-dimensional vectors and is specifically designed to excel in retrieving relevant passages for Polish queries.

Key Capabilities

Specialized Polish IR: Optimized for information retrieval in the Polish language.
High Performance: Achieves an NDCG@10 of 62.32 on the Polish Information Retrieval Benchmark (PIRB).
Prompt-based Usage: Utilizes specific prompts for retrieval and symmetric tasks, consistent with the original stella_en_1.5B_v5.
Efficient Encoding: Transforms texts into 1024-dimensional embeddings.

Good For

Building Polish Search Engines: Ideal for applications requiring precise document or passage retrieval for Polish queries.
Semantic Search in Polish: Can be used for semantic similarity tasks within Polish text, though a more versatile encoder like sdadas/stella-pl might be preferred for broader semantic tasks.
Research in Polish NLP: Provides a strong baseline for further research and development in Polish information retrieval.

Overview

Overview

Key Capabilities

Good For

Full Model Card (README)