Name: state-spaces/mamba-790m-hf API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: state-spaces

Model Overview

The state-spaces/mamba-790m-hf is a 0.79 billion parameter causal language model based on the Mamba architecture, provided by state-spaces. It is designed to be fully compatible with the Hugging Face Transformers library, offering an alternative to traditional transformer-based models for sequence processing. The model's checkpoints are maintained in their original form, with a complete config.json and tokenizer pushed to the repository for ease of use.

Key Capabilities and Features

Mamba Architecture: Utilizes the Mamba architecture, which is known for its efficient handling of long sequences and potentially faster inference compared to standard transformers.
Transformers Compatibility: Seamlessly integrates with the Hugging Face Transformers ecosystem, allowing users to leverage familiar APIs like generate.
Optimized Performance: Supports CUDA kernels for causal_conv_1d and mamba-ssm to achieve optimized performance; falls back to an eager implementation if these are not installed.
PEFT Finetuning Support: Designed to be easily finetuned using the peft library, with recommendations to keep the model in float32 during finetuning for best results.
Context Length: Offers a substantial context window of 32768 tokens, enabling the model to process and generate longer text sequences.

Good For

Text Generation: Suitable for various text generation tasks, including conversational AI, content creation, and code generation snippets.
Research and Experimentation: Ideal for researchers and developers interested in exploring the Mamba architecture's performance and capabilities within the Hugging Face ecosystem.
Resource-Efficient Deployment: Given its 0.79 billion parameters, it can be a good choice for applications where computational resources are a consideration, while still offering strong performance due to its architecture.

Overview

Model Overview

Key Capabilities and Features

Good For

Full Model Card (README)