ibm-ai-platform/Bamba-9B-v2

TEXT GENERATIONConcurrency Cost:1Model Size:9BQuant:FP8Ctx Length:32kPublished:Apr 25, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Bamba-9B-v2 is a 9.78 billion parameter decoder-only language model developed by ibm-ai-platform, built on the Mamba-2 architecture. Trained for an additional 1 trillion tokens over its predecessor, this model demonstrates improved performance, outperforming Llama 3.1 8B on L1 and L2 leaderboards despite being trained on significantly less data. It is designed for a wide range of text generation tasks and offers efficient inference capabilities.

Loading preview...

Bamba-9B-v2: A Mamba-2 Architecture Model

Bamba-9B-v2 is a 9.78 billion parameter decoder-only language model from ibm-ai-platform, leveraging the efficient Mamba-2 architecture. This version significantly improves upon Bamba v1, having undergone an additional 1 trillion tokens of training data, bringing its total pretraining to 3.1 trillion tokens.

Key Capabilities & Performance

  • Enhanced Performance: Bamba-9B-v2 demonstrates strong performance on various benchmarks, with L1 and L2 leaderboard scores outperforming Llama 3.1 8B, which was trained with nearly five times the data.
  • General Text Generation: Designed to handle a broad spectrum of text generation tasks.
  • Efficient Inference: Supports efficient inference through its Mamba-2 architecture and offers quantization options (FP8) for further memory and speed optimization, reducing memory usage from 39.12 GB to 10.83 GB.
  • Benchmark Scores: Achieves 67.92 on MMLU (5-shot), 63.57 on ARC-C (25-shot), and 41.70 on GSM8K (5-shot).

Development & Integration

  • Training: Trained using FSDP with the official Mamba implementation, with resources available for reproduction.
  • Hugging Face Integration: Fully integrated with Hugging Face transformers for straightforward inference.
  • Quantization: Supports FP8 quantization via fms-model-optimizer for reduced memory footprint and faster inference.
  • llama.cpp Support: Preliminary work is underway to enable Bamba models on llama.cpp for CPU-only inference, with GGUF conversion tools available.