NorMistral-11B-Thinking Overview
NorMistral-11B-Thinking is an 11 billion parameter instruction-tuned language model developed by norallm, specifically optimized for the Norwegian language. It is built upon the NorMistral-11B base model and has undergone extensive post-training using a novel fluency-preserving reinforcement learning approach, as detailed in their paper "Fluent Alignment with Disfluent Judges: Post-training for Lower-resource Languages".
Key Capabilities
- Norwegian Language Proficiency: Excels in understanding and generating Norwegian (Bokmål and Nynorsk) text.
- "Thinking" Capability: Generates an internal reasoning trace (enclosed in
<think>...</think> tokens) before producing the final response, offering transparency into its decision-making process. - Instruction Following: Highly capable of following complex instructions due to supervised finetuning on English responses and reasoning traces from Kimi-K2-Thinking.
- Reinforcement Learning: Utilizes direct reinforcement learning from AI feedback (d-RLAIF) with Mistral-Large-Instruct-2411 as the reward model, enhancing its fluency and alignment.
- Apache 2.0 License: Freely available for use, with model weights under an open license.
Evaluation Highlights
NorMistral-11B-Thinking demonstrates strong performance on a generative version of NorEval, particularly in classification tasks like NoReC sentiment analysis and NorIdiom, often outperforming models like Llama-3.1-8B and Mistral-Nemo-12B in specific Norwegian benchmarks. While other models may lead in certain areas (e.g., Qwen3-15B* in NorCSQA and NorOBQA), NorMistral-11B* shows competitive results and unique strengths in Norwegian language understanding.
Good for
- Norwegian Language Applications: Ideal for chatbots, content generation, and language understanding tasks specifically in Norwegian.
- Research and Development: Useful for researchers exploring reinforcement learning, lower-resource language models, and transparent AI reasoning.
- Local Deployment: Available in various GGUF formats for efficient local inference with
ollama or llama.cpp.