The sdadas/stella-pl-retrieval model is a 1.5 billion parameter text encoder developed by sdadas, based on stella_en_1.5B_v5 and fine-tuned for Polish information retrieval tasks. It was adapted for Polish using multilingual knowledge distillation and further fine-tuned with contrastive loss on 1.4 million queries. This model transforms texts into 1024-dimensional vectors and is specifically optimized for high performance in Polish information retrieval, achieving an NDCG@10 of 62.32 on the Polish Information Retrieval Benchmark.
Loading preview...
Overview
sdadas/stella-pl-retrieval is a 1.5 billion parameter text encoder, a specialized version of the stella_en_1.5B_v5 model, meticulously fine-tuned for Polish information retrieval (IR) tasks. It was developed by sdadas through a two-step process:
- Multilingual Adaptation: Initially adapted for Polish using a knowledge distillation method on a diverse corpus of 20 million Polish-English text pairs.
- Contrastive Fine-tuning: Further fine-tuned with contrastive loss on a dataset of 1.4 million queries, where positive and negative passages were selected using the
BAAI/bge-reranker-v2.5-gemma2-lightweight.
This model encodes texts into 1024-dimensional vectors and is specifically designed to excel in retrieving relevant passages for Polish queries.
Key Capabilities
- Specialized Polish IR: Optimized for information retrieval in the Polish language.
- High Performance: Achieves an NDCG@10 of 62.32 on the Polish Information Retrieval Benchmark (PIRB).
- Prompt-based Usage: Utilizes specific prompts for retrieval and symmetric tasks, consistent with the original
stella_en_1.5B_v5. - Efficient Encoding: Transforms texts into 1024-dimensional embeddings.
Good For
- Building Polish Search Engines: Ideal for applications requiring precise document or passage retrieval for Polish queries.
- Semantic Search in Polish: Can be used for semantic similarity tasks within Polish text, though a more versatile encoder like
sdadas/stella-plmight be preferred for broader semantic tasks. - Research in Polish NLP: Provides a strong baseline for further research and development in Polish information retrieval.