szymonrucinski/Curie-7B-v1
Curie-7B-v1 is a 7 billion parameter decoder-based large language model developed by Szymon Ruciński, specifically fine-tuned for Polish text generation. It utilizes Language Adaptive Pre-training (LAPT) on a high-quality Polish dataset, achieving a perplexity of 3.02 and rivaling top Polish encoder-decoder models on KLEJ challenges. This model excels in generating Polish text and can be adapted for various NLP tasks, including classification and regression, with high efficiency.
Loading preview...
Curie-7B-v1: Efficient Polish Language Model
Curie-7B-v1 is a 7 billion parameter decoder-based LLM developed by Szymon Ruciński, showcasing the effectiveness of fine-tuning English LLMs for Polish. It achieves remarkable performance through Language Adaptive Pre-training (LAPT) on a high-quality 3.11 GB Polish dataset (276 million tokens), followed by fine-tuning on the KLEJ challenges.
Key Capabilities & Performance
- Lowest Perplexity: Achieves a perplexity of 3.02 for Polish text generation among decoder-based models.
- Efficient Training: Rivals the best Polish encoder-decoder models on 8 out of 9 KLEJ tasks, using only 2-3% of the typical dataset size.
- Versatile Adaptation: Can be transformed into classifiers, regressors, and AI assistants for various Polish NLP tasks.
- Benchmark Highlights (KLEJ tasks):
- NKJP-NER: 93.4
- CDSC-E: 92.2
- CDSC-R: 94.9
- PolEmo2.0-IN: 92.7
- PSC: 98.6
Training Details
The LAPT phase utilized a 2 GB high-quality extract from the SpeakLeash dataset. Training was conducted for one epoch over 106 hours, using an NVIDIA RTX A6000 ADA GPU. Further details are available in the research paper: Efficient Language Adaptive Pre-training: Extending State-of-the-Art Large Language Models for Polish.
Good For
- Generating high-quality Polish text.
- Developing efficient business solutions requiring Polish NLP capabilities.
- Research and development in Polish language modeling with limited data resources.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.