VSSA-SDSA/LT_AI_DLKVM
LT_AI_DLKVM is a 1.04 billion parameter causal language model developed by the State Digital Solutions Agency (SDSA), based on Llama3 design principles. It is specifically pretrained from scratch for the Lithuanian language, utilizing a 32,000-token tokenizer and supporting an extensive 32,768-token context length. This model is designed for research, pretraining, and adaptation in Lithuanian text generation and NLP tasks, serving as a base generative model for further fine-tuning.
Loading preview...
What is LT_AI_DLKVM?
LT_AI_DLKVM is a 1.04 billion parameter causal language model, developed by the State Digital Solutions Agency (SDSA) as part of the BLKT-VMS pipeline. It is built on Llama3 design principles and is specifically designed for the Lithuanian language. The model was pretrained from scratch in two stages, using the Lithuanian Text Corpus and a custom 32,000-token tokenizer.
Key Capabilities & Features
- Lithuanian Language Focus: Exclusively trained for Lithuanian text generation and NLP tasks.
- Long Context Window: Supports an impressive 32,768-token context length, enabling efficient processing of extensive Lithuanian texts.
- Base Generative Model: Intended for research, pretraining, and further fine-tuning for specific applications.
- Custom Tokenizer: Utilizes a specially trained 32,000-token tokenizer for optimal performance with Lithuanian.
- Two-Stage Training: Initial training from scratch (8,196 tokens context) followed by long-context training (32,768 tokens context) on 8 NVIDIA H100-SXM5-80GB GPUs.
Should I use this for my use case?
LT_AI_DLKVM is ideal for:
- Research and Development: Experimenting with Lithuanian NLP, language generation, and domain adaptation.
- Base Model for Fine-tuning: Projects requiring robust Lithuanian text generation that can be specialized for chat, summarization, classification, or domain-specific content.
- Long-Context Applications: Scenarios where processing and generating long Lithuanian documents or conversations are critical.
Limitations: This model is a base causal language model and is not instruction-tuned or task-specialized by default. It may generate factually inaccurate or biased content and is not suitable for high-stakes applications without additional fine-tuning, safeguards, and validation. Performance outside Lithuanian-centric domains may be less reliable.