Tess-3-Mistral-Nemo-12B Overview
Tess-3-Mistral-Nemo-12B is a 12 billion parameter general-purpose large language model developed by Migel Tissera, part of the "Tesoro" (Tess) series. The model was developed with compute resources sponsored by KindoAI. It features a substantial context length of 32768 tokens, enabling it to process and generate longer sequences of text.
Key Capabilities & Performance
This model is designed for general language tasks. Its performance on the Open LLM Leaderboard, as detailed in the provided evaluation results, indicates its current standing across various benchmarks. Noteworthy scores include:
- IFEval (0-Shot): 33.55
- BBH (3-Shot): 28.04
- MMLU-PRO (5-shot): 17.39
These metrics provide insight into its reasoning, instruction following, and general knowledge capabilities. The model's architecture and training aim for versatility in language understanding and generation.
Good For
- General-purpose text generation: Suitable for a wide array of tasks requiring coherent and contextually relevant text output.
- Exploratory AI applications: Developers can leverage its general capabilities for various language-based projects.
- Long context processing: Its 32768 token context window makes it suitable for tasks requiring understanding or generation over extended inputs.