norallm/normistral-11b-long

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:12BQuant:FP8Ctx Length:32kPublished:Dec 8, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

NorMistral-11b-long is a 11.4 billion parameter causal language model developed by the Language Technology Group at the University of Oslo (LTG) as part of the NORA.LLM family. This model is a length-extended version of NorMistral-11b-warm, featuring an increased context length of 32,768 tokens. It was continually trained on 50 billion subword tokens, including Scandinavian, Sámi, English, and code data, making it particularly adept for research involving Norwegian and Sámi languages.

Loading preview...

NorMistral-11b-long: Extended Context for Scandinavian Languages

NorMistral-11b-long is an 11.4 billion parameter causal language model developed by the Language Technology Group at the University of Oslo (LTG) within the NORA.LLM initiative. It is a length-extended version of NorMistral-11b-warm, designed with a significantly increased context window of 32,768 tokens.

Key Capabilities & Features

  • Extended Context: Achieves a 32,768 token context length through continual training on an additional 50 billion subword tokens.
  • Multilingual Focus: Training data includes a mix of Scandinavian (Norwegian Bokmål, Nynorsk, Danish, Swedish, Icelandic, Faroese), Sámi, English, and programming code.
  • Optimized Tokenizer: Utilizes a new, specially trained tokenizer for target languages, resulting in substantially faster inference compared to the base Mistral-Nemo-Base-2407 model, with improved subword-to-word split ratios.
  • Architecture: Based on the Mistral architecture, featuring pre-normalization with RMSNorm, SwiGLU activation, Rotary positional embeddings, and Grouped-query attention.
  • Research-Oriented: Primarily intended for research purposes, particularly in the domain of low-resource and Scandinavian language processing.

Good For

  • Research in Scandinavian NLP: Ideal for academic and research applications focusing on Norwegian, Sámi, and other Nordic languages.
  • Long-Context Tasks: Suitable for tasks requiring processing extensive textual inputs, thanks to its 32,768 token context window.
  • Continual Training Studies: A practical example of continual training for language extension, following the methodology outlined in the paper "Small Languages, Big Models: A Study of Continual Training on Languages of Norway."
  • Efficient Inference: Benefits from a custom tokenizer that enhances inference speed for its target languages.