Model Overview
NbAiLab/nb-notram-llama-3.2-1b-instruct is a 1 billion parameter model from the National Library of Norway (NB-AiLab), part of their "NB-Llama-3.x" and "NoTraM" series. It is fine-tuned on Meta’s Llama-3.2-1B-Instruct to significantly improve instruction-following in Norwegian Bokmål and Norwegian Nynorsk, while preserving its English capabilities. This model is an experiment in adapting modern open-weight models for Norwegian using only publicly available data, explicitly excluding legal deposit material.
Key Capabilities & Features
- Multilingual Instruction-Following: Enhanced performance for Norwegian Bokmål, Norwegian Nynorsk, and English.
- Concise Responses: Tends to produce shorter, more concise answers, reflecting its current instruction-tuning recipe.
- Public Data Training: Trained exclusively on publicly available datasets and synthetic data, including CulturaX, HPLT monolingual, Norwegian Colossal Corpus, and Wikipedia.
- Advanced Data Curation: Utilizes a data selection and filtering approach inspired by FineWeb, incorporating "Corpus Quality Classifiers" to prioritize educational value and linguistic quality.
- Research Focus: Explores techniques to adapt instruction-tuned models to Norwegian language, culture, and history, aiming to reduce "knowledge pocketing" and improve generalization.
Intended Use Cases
- Dialogue Systems: Suitable for assistant-style applications in Norwegian (Bokmål/Nynorsk) and English.
- Summarization & Q&A: Effective for summarization and question-answering tasks in Bokmål or Nynorsk.
Limitations
- May produce incorrect or fabricated statements.
- Norwegian cultural/historical knowledge can be uneven or "pocketed" (prompt-sensitive).
- Safety alignment is limited; careful evaluation is recommended for specific use cases.