AI-Sweden-Models/Llama-3-8B is an 8 billion parameter base language model developed by AI-Sweden-Models, continuing the pretraining of Meta-Llama-3-8B. It was specifically fine-tuned on a 227 billion token subset of The Nordic Pile, focusing on Swedish, Norwegian, and Danish languages. This model is designed as a foundational base for further fine-tuning on particular use cases, excelling in Nordic language generation and understanding with an 8192 token context length.
AI-Sweden-Models/Llama-3-8B Overview
This model is a specialized 8 billion parameter base language model developed by AI-Sweden-Models. It builds upon the meta-llama/Meta-Llama-3-8B architecture, undergoing a full fine-tuning process on all model parameters.
Key Characteristics
- Nordic Language Specialization: The model's pretraining continuation focused exclusively on a substantial subset of The Nordic Pile, comprising 227 billion tokens across Swedish, Norwegian, and Danish.
- Base Model: Designed as a foundational model, it is intended for further fine-tuning to specific applications and use cases rather than direct instruction-following out-of-the-box. An instruction-tuned version is available separately.
- Training Infrastructure: Training was conducted on the Rattler supercomputer, utilizing 92 Nvidia A100 GPUs over 30 days, ensuring robust and extensive language adaptation.
- Context Length: Supports a sequence length of 8192 tokens, allowing for processing longer inputs and generating more coherent, extended outputs.
Intended Use
This model is ideal for developers and researchers looking to build applications requiring strong performance in Nordic languages. Its base model nature makes it suitable for:
- Custom Fine-tuning: Adapting to specific domains, tasks, or instruction formats in Swedish, Norwegian, or Danish.
- Research: Exploring language understanding and generation within Nordic linguistic contexts.
- Multilingual Applications: Serving as a core component for applications targeting Scandinavian users.