state-spaces/mamba-790m-hf

TEXT GENERATIONConcurrency Cost:1Model Size:0.79BQuant:BF16Ctx Length:32kPublished:Mar 6, 2024Architecture:Transformer0.0K Cold

The state-spaces/mamba-790m-hf is a 0.79 billion parameter Mamba-based causal language model, provided by state-spaces, compatible with the Hugging Face Transformers library. This model leverages the Mamba architecture, known for its efficient sequence handling, and is designed for general text generation tasks. It offers a context length of 32768 tokens, making it suitable for applications requiring processing of moderately long sequences. The model is optimized for performance with CUDA kernels when `causal_conv_1d` and `mamba-ssm` are installed, otherwise it defaults to an eager implementation.

Loading preview...

Model Overview

The state-spaces/mamba-790m-hf is a 0.79 billion parameter causal language model based on the Mamba architecture, provided by state-spaces. It is designed to be fully compatible with the Hugging Face Transformers library, offering an alternative to traditional transformer-based models for sequence processing. The model's checkpoints are maintained in their original form, with a complete config.json and tokenizer pushed to the repository for ease of use.

Key Capabilities and Features

  • Mamba Architecture: Utilizes the Mamba architecture, which is known for its efficient handling of long sequences and potentially faster inference compared to standard transformers.
  • Transformers Compatibility: Seamlessly integrates with the Hugging Face Transformers ecosystem, allowing users to leverage familiar APIs like generate.
  • Optimized Performance: Supports CUDA kernels for causal_conv_1d and mamba-ssm to achieve optimized performance; falls back to an eager implementation if these are not installed.
  • PEFT Finetuning Support: Designed to be easily finetuned using the peft library, with recommendations to keep the model in float32 during finetuning for best results.
  • Context Length: Offers a substantial context window of 32768 tokens, enabling the model to process and generate longer text sequences.

Good For

  • Text Generation: Suitable for various text generation tasks, including conversational AI, content creation, and code generation snippets.
  • Research and Experimentation: Ideal for researchers and developers interested in exploring the Mamba architecture's performance and capabilities within the Hugging Face ecosystem.
  • Resource-Efficient Deployment: Given its 0.79 billion parameters, it can be a good choice for applications where computational resources are a consideration, while still offering strong performance due to its architecture.