migtissera/Tess-3-Mistral-Nemo-12B

TEXT GENERATIONConcurrency Cost:1Model Size:12BQuant:FP8Ctx Length:32kPublished:Aug 13, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Tess-3-Mistral-Nemo-12B is a 12 billion parameter general-purpose large language model from the Tess series, created by Migel Tissera. This model is designed for broad applications, leveraging a 32768 token context length. It is part of the Tesoro (Tess) family, aiming to provide versatile language understanding and generation capabilities.

Loading preview...

Tess-3-Mistral-Nemo-12B Overview

Tess-3-Mistral-Nemo-12B is a 12 billion parameter general-purpose large language model developed by Migel Tissera, part of the "Tesoro" (Tess) series. The model was developed with compute resources sponsored by KindoAI. It features a substantial context length of 32768 tokens, enabling it to process and generate longer sequences of text.

Key Capabilities & Performance

This model is designed for general language tasks. Its performance on the Open LLM Leaderboard, as detailed in the provided evaluation results, indicates its current standing across various benchmarks. Noteworthy scores include:

  • IFEval (0-Shot): 33.55
  • BBH (3-Shot): 28.04
  • MMLU-PRO (5-shot): 17.39

These metrics provide insight into its reasoning, instruction following, and general knowledge capabilities. The model's architecture and training aim for versatility in language understanding and generation.

Good For

  • General-purpose text generation: Suitable for a wide array of tasks requiring coherent and contextually relevant text output.
  • Exploratory AI applications: Developers can leverage its general capabilities for various language-based projects.
  • Long context processing: Its 32768 token context window makes it suitable for tasks requiring understanding or generation over extended inputs.