Hugofernandez/Mistral-7B-v0.1-colab-sharded

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Nov 28, 2023License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Hugofernandez/Mistral-7B-v0.1-colab-sharded is a re-sharded version of the Mistral-7B-v0.1 Large Language Model, developed by the Mistral AI Team. This 7 billion parameter generative text model is optimized with Grouped-Query Attention and Sliding-Window Attention, and is specifically re-sharded into 6 parts to facilitate easier loading on machines with limited RAM, such as free Google Colab instances. It outperforms Llama 2 13B on all tested benchmarks, making it suitable for general generative text tasks where resource efficiency is critical.

Loading preview...

Model Overview

Hugofernandez/Mistral-7B-v0.1-colab-sharded is a specialized distribution of the original Mistral-7B-v0.1 model, developed by the Mistral AI Team. This version has been re-sharded into 6 parts, an increase from the original 2, to enable more efficient loading and operation on systems with constrained memory resources, such as free tiers of Google Colab. The base model is a 7 billion parameter pretrained generative text model.

Key Architectural Features

The Mistral-7B-v0.1 model incorporates several advanced transformer architecture choices to enhance performance and efficiency:

  • Grouped-Query Attention: Improves inference speed and reduces memory footprint.
  • Sliding-Window Attention: Allows for handling longer sequences more efficiently by restricting attention to a local window.
  • Byte-fallback BPE tokenizer: Provides robust tokenization across diverse text inputs.

Performance Highlights

According to the original Mistral AI team, Mistral-7B-v0.1 demonstrates strong performance, outperforming Llama 2 13B across all tested benchmarks. This indicates its capability for various generative text tasks despite its smaller parameter count compared to some larger models.

Usage Considerations

As a pretrained base model, Mistral-7B-v0.1 does not include built-in moderation mechanisms. Users should implement their own content moderation layers when deploying the model in applications. For optimal performance, it is recommended to use a stable version of the Transformers library, specifically 4.34.0 or newer, to avoid potential KeyError or NotImplementedError issues.