Name: overthelex/qwen2.5-14b-edrsr-legal-uk API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: overthelex

Model Overview

overthelex/qwen2.5-14b-edrsr-legal-uk is a 14.8 billion parameter language model derived from the Qwen2.5-14B architecture. It has undergone continued pretraining (CPT) specifically on the Ukrainian legal domain, utilizing the extensive Unified State Register of Court Decisions of Ukraine (EDRSR) corpus.

Key Characteristics

Domain-Specific Training: Pretrained on 33.9 million Ukrainian court decisions, totaling 161.4 billion tokens, to deeply embed legal knowledge relevant to Ukraine.
Performance: Achieved a significant perplexity reduction of 54.8% on the EDRSR corpus, lowering the perplexity from 2.84 to 1.28, indicating strong domain adaptation.
Scaling Experiment: This model is part of a larger scaling experiment (0.5B, 1.5B, 3B, 14B models) investigating continued pretraining for low-resource legal languages.
Base Model: It is a base model, not instruction-tuned, meaning it is designed for further adaptation rather than direct conversational use.

Intended Use Cases

Research: Ideal for academic research into domain adaptation of large language models for low-resource legal languages.
Downstream Fine-tuning: Serves as an excellent foundation for fine-tuning on specific Ukrainian legal NLP tasks.
Perplexity Evaluation: Useful for evaluating language model performance on Ukrainian legal texts.

Limitations

Not Instruction-Tuned: This model will not follow instructions or engage in chat-like interactions.
Domain Specificity: Its training on Ukrainian court decisions means it may not generalize well to other legal systems or general-purpose tasks.

Overview

Model Overview

Key Characteristics

Intended Use Cases

Limitations

Full Model Card (README)