NbAiLab/nb-notram-llama-3.2-1b-instruct

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1BQuant:BF16Ctx Length:32kPublished:Nov 28, 2024License:llama3.2Architecture:Transformer0.0K Warm

NbAiLab/nb-notram-llama-3.2-1b-instruct is a 1 billion parameter instruction-tuned causal language model developed by the National Library of Norway (NB-AiLab). Built upon Meta's Llama-3.2-1B-Instruct, it is specifically fine-tuned to enhance instruction-following in Norwegian Bokmål and Nynorsk while maintaining strong English performance. This model focuses on adapting instruction-tuned models to Norwegian language, culture, and history using only publicly available data, aiming to reduce 'knowledge pocketing' for improved generalization.

Loading preview...

Model Overview

NbAiLab/nb-notram-llama-3.2-1b-instruct is a 1 billion parameter model from the National Library of Norway (NB-AiLab), part of their "NB-Llama-3.x" and "NoTraM" series. It is fine-tuned on Meta’s Llama-3.2-1B-Instruct to significantly improve instruction-following in Norwegian Bokmål and Norwegian Nynorsk, while preserving its English capabilities. This model is an experiment in adapting modern open-weight models for Norwegian using only publicly available data, explicitly excluding legal deposit material.

Key Capabilities & Features

  • Multilingual Instruction-Following: Enhanced performance for Norwegian Bokmål, Norwegian Nynorsk, and English.
  • Concise Responses: Tends to produce shorter, more concise answers, reflecting its current instruction-tuning recipe.
  • Public Data Training: Trained exclusively on publicly available datasets and synthetic data, including CulturaX, HPLT monolingual, Norwegian Colossal Corpus, and Wikipedia.
  • Advanced Data Curation: Utilizes a data selection and filtering approach inspired by FineWeb, incorporating "Corpus Quality Classifiers" to prioritize educational value and linguistic quality.
  • Research Focus: Explores techniques to adapt instruction-tuned models to Norwegian language, culture, and history, aiming to reduce "knowledge pocketing" and improve generalization.

Intended Use Cases

  • Dialogue Systems: Suitable for assistant-style applications in Norwegian (Bokmål/Nynorsk) and English.
  • Summarization & Q&A: Effective for summarization and question-answering tasks in Bokmål or Nynorsk.

Limitations

  • May produce incorrect or fabricated statements.
  • Norwegian cultural/historical knowledge can be uneven or "pocketed" (prompt-sensitive).
  • Safety alignment is limited; careful evaluation is recommended for specific use cases.