szymonrucinski/Curie-7B-v1

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:8kPublished:Jan 11, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Curie-7B-v1 is a 7 billion parameter decoder-based large language model developed by Szymon Ruciński, specifically fine-tuned for Polish text generation. It utilizes Language Adaptive Pre-training (LAPT) on a high-quality Polish dataset, achieving a perplexity of 3.02 and rivaling top Polish encoder-decoder models on KLEJ challenges. This model excels in generating Polish text and can be adapted for various NLP tasks, including classification and regression, with high efficiency.

Loading preview...

Curie-7B-v1: Efficient Polish Language Model

Curie-7B-v1 is a 7 billion parameter decoder-based LLM developed by Szymon Ruciński, showcasing the effectiveness of fine-tuning English LLMs for Polish. It achieves remarkable performance through Language Adaptive Pre-training (LAPT) on a high-quality 3.11 GB Polish dataset (276 million tokens), followed by fine-tuning on the KLEJ challenges.

Key Capabilities & Performance

  • Lowest Perplexity: Achieves a perplexity of 3.02 for Polish text generation among decoder-based models.
  • Efficient Training: Rivals the best Polish encoder-decoder models on 8 out of 9 KLEJ tasks, using only 2-3% of the typical dataset size.
  • Versatile Adaptation: Can be transformed into classifiers, regressors, and AI assistants for various Polish NLP tasks.
  • Benchmark Highlights (KLEJ tasks):
    • NKJP-NER: 93.4
    • CDSC-E: 92.2
    • CDSC-R: 94.9
    • PolEmo2.0-IN: 92.7
    • PSC: 98.6

Training Details

The LAPT phase utilized a 2 GB high-quality extract from the SpeakLeash dataset. Training was conducted for one epoch over 106 hours, using an NVIDIA RTX A6000 ADA GPU. Further details are available in the research paper: Efficient Language Adaptive Pre-training: Extending State-of-the-Art Large Language Models for Polish.

Good For

  • Generating high-quality Polish text.
  • Developing efficient business solutions requiring Polish NLP capabilities.
  • Research and development in Polish language modeling with limited data resources.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p