Zardos/Kant-Test-0.1-Mistral-7B
Zardos/Kant-Test-0.1-Mistral-7B is a 7 billion parameter pretrained generative text model based on the Mistral-7B-v0.1 architecture, developed by Mistral AI. It incorporates Grouped-Query Attention and Sliding-Window Attention, outperforming Llama 2 13B on tested benchmarks. This model is suitable for general text generation tasks, offering strong performance for its size.
Loading preview...
Model Overview
Zardos/Kant-Test-0.1-Mistral-7B is a 7 billion parameter pretrained generative text model, leveraging the Mistral-7B-v0.1 architecture developed by the Mistral AI team. This model is noted for its efficiency and performance, outperforming larger models like Llama 2 13B across various benchmarks.
Key Architectural Features
- Grouped-Query Attention (GQA): Enhances inference speed and reduces memory requirements.
- Sliding-Window Attention (SWA): Optimizes attention mechanisms for longer sequences, improving efficiency.
- Byte-fallback BPE tokenizer: Provides robust tokenization.
Performance Highlights
Evaluated on the Open LLM Leaderboard, this model achieves an average score of 62.42. Notable benchmark results include:
- AI2 Reasoning Challenge (25-Shot): 62.37
- HellaSwag (10-Shot): 82.84
- MMLU (5-Shot): 63.38
- Winogrande (5-shot): 78.30
Intended Use Cases
As a pretrained base model, Zardos/Kant-Test-0.1-Mistral-7B is well-suited for a wide range of general text generation tasks where a balance of performance and computational efficiency is desired. Developers should note that, as a base model, it does not include built-in moderation mechanisms.