neph1/Mistral-Nemo-Instruct-bellman-12b
TEXT GENERATIONConcurrency Cost:1Model Size:12BQuant:FP8Ctx Length:32kPublished:Oct 20, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

neph1/Mistral-Nemo-Instruct-bellman-12b is a 12 billion parameter instruction-tuned language model, fine-tuned from Mistral-Nemo-Instruct-2407. Developed by neph1, this model specializes in prompt question answering, particularly for Sweden-centric topics, utilizing a dataset derived from Swedish Wikipedia. It is further enhanced with translated code-feedback data and stories, aiming for concise, less verbose responses, and supports a context length of 32768 tokens.

Loading preview...

Overview

neph1/Mistral-Nemo-Instruct-bellman-12b is a 12 billion parameter instruction-tuned model, fine-tuned from the Mistral-Nemo-Instruct-2407 base. This version is a rank 128 QLoRA trained for approximately one epoch, developed by neph1. It is specifically optimized for prompt question answering, with a strong focus on Swedish language content and Sweden-centric questions, leveraging a dataset created from Swedish Wikipedia. The model also incorporates questions from a translated code-feedback dataset and various stories.

Key Capabilities

  • Swedish Language Proficiency: Demonstrates improved Swedish language generation with fewer awkward phrasings compared to its base model.
  • Concise Question Answering: Designed to provide fairly short and less verbose answers, making it suitable for direct information retrieval.
  • Sweden-Centric Knowledge: Excels in answering questions related to Sweden, drawing from its training on Swedish Wikipedia data.
  • Code-Feedback Integration: Includes training on translated code-feedback data, potentially enhancing its understanding of technical queries.

Good for

  • Applications requiring Swedish language question answering.
  • Use cases where concise and direct responses are preferred.
  • Developing chatbots or tools focused on Swedish history, geography, and culture.
  • Experimentation with models that have integrated code-feedback data for improved instruction following.