ibm-ai-platform/Bamba-9B-v1

TEXT GENERATIONConcurrency Cost:1Model Size:9BQuant:FP8Ctx Length:32kPublished:Dec 3, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Bamba-9B-v1 is a 9 billion parameter decoder-only language model developed by ibm-ai-platform, based on the Mamba-2 architecture. It is trained from scratch using a two-stage approach on 2.2 trillion tokens, including the Dolma v1.7 dataset. Designed to handle a wide range of text generation tasks, this model leverages its Mamba-2 architecture for efficient processing. It offers a 32768 token context length and is suitable for general-purpose language understanding and generation.

Loading preview...

Overview

Bamba-9B-v1 is a 9 billion parameter decoder-only language model developed by ibm-ai-platform, built upon the Mamba-2 architecture. It was trained using a two-stage process, initially on 2 trillion tokens from the Dolma v1.7 dataset, followed by an additional 200 billion tokens from a curated high-quality blend. This two-stage pretraining aims to refine performance and enhance output quality for diverse text generation tasks.

Key Capabilities

  • Mamba-2 Architecture: Utilizes the Mamba-2 state-space model architecture for efficient sequence processing.
  • Extensive Pretraining: Trained on a total of 2.2 trillion tokens, ensuring broad language understanding.
  • Text Generation: Designed to handle a wide range of text generation tasks.
  • Quantization Support: Provides FP8 quantized versions for more efficient storage and inference, reducing memory usage significantly.
  • Hugging Face Integration: Fully integrated with Hugging Face Transformers for easy inference and fine-tuning.

Good For

  • General Text Generation: Suitable for various applications requiring text output.
  • Research and Development: Offers a Mamba-2 based model for exploring alternative architectures.
  • Resource-Efficient Deployment: Quantized versions enable deployment in environments with memory constraints.
  • Fine-tuning: Supports fine-tuning for specific downstream tasks using tools like SFT Trainer.