ibm-ai-platform/Bamba-9B-v1
Bamba-9B-v1 is a 9 billion parameter decoder-only language model developed by ibm-ai-platform, based on the Mamba-2 architecture. It is trained from scratch using a two-stage approach on 2.2 trillion tokens, including the Dolma v1.7 dataset. Designed to handle a wide range of text generation tasks, this model leverages its Mamba-2 architecture for efficient processing. It offers a 32768 token context length and is suitable for general-purpose language understanding and generation.
Loading preview...
Overview
Bamba-9B-v1 is a 9 billion parameter decoder-only language model developed by ibm-ai-platform, built upon the Mamba-2 architecture. It was trained using a two-stage process, initially on 2 trillion tokens from the Dolma v1.7 dataset, followed by an additional 200 billion tokens from a curated high-quality blend. This two-stage pretraining aims to refine performance and enhance output quality for diverse text generation tasks.
Key Capabilities
- Mamba-2 Architecture: Utilizes the Mamba-2 state-space model architecture for efficient sequence processing.
- Extensive Pretraining: Trained on a total of 2.2 trillion tokens, ensuring broad language understanding.
- Text Generation: Designed to handle a wide range of text generation tasks.
- Quantization Support: Provides FP8 quantized versions for more efficient storage and inference, reducing memory usage significantly.
- Hugging Face Integration: Fully integrated with Hugging Face Transformers for easy inference and fine-tuning.
Good For
- General Text Generation: Suitable for various applications requiring text output.
- Research and Development: Offers a Mamba-2 based model for exploring alternative architectures.
- Resource-Efficient Deployment: Quantized versions enable deployment in environments with memory constraints.
- Fine-tuning: Supports fine-tuning for specific downstream tasks using tools like SFT Trainer.