ibm-ai-platform/Bamba-9B-v2
Bamba-9B-v2 is a 9.78 billion parameter decoder-only language model developed by ibm-ai-platform, built on the Mamba-2 architecture. Trained for an additional 1 trillion tokens over its predecessor, this model demonstrates improved performance, outperforming Llama 3.1 8B on L1 and L2 leaderboards despite being trained on significantly less data. It is designed for a wide range of text generation tasks and offers efficient inference capabilities.
Loading preview...
Bamba-9B-v2: A Mamba-2 Architecture Model
Bamba-9B-v2 is a 9.78 billion parameter decoder-only language model from ibm-ai-platform, leveraging the efficient Mamba-2 architecture. This version significantly improves upon Bamba v1, having undergone an additional 1 trillion tokens of training data, bringing its total pretraining to 3.1 trillion tokens.
Key Capabilities & Performance
- Enhanced Performance: Bamba-9B-v2 demonstrates strong performance on various benchmarks, with L1 and L2 leaderboard scores outperforming Llama 3.1 8B, which was trained with nearly five times the data.
- General Text Generation: Designed to handle a broad spectrum of text generation tasks.
- Efficient Inference: Supports efficient inference through its Mamba-2 architecture and offers quantization options (FP8) for further memory and speed optimization, reducing memory usage from 39.12 GB to 10.83 GB.
- Benchmark Scores: Achieves 67.92 on MMLU (5-shot), 63.57 on ARC-C (25-shot), and 41.70 on GSM8K (5-shot).
Development & Integration
- Training: Trained using FSDP with the official Mamba implementation, with resources available for reproduction.
- Hugging Face Integration: Fully integrated with Hugging Face
transformersfor straightforward inference. - Quantization: Supports FP8 quantization via
fms-model-optimizerfor reduced memory footprint and faster inference. - llama.cpp Support: Preliminary work is underway to enable Bamba models on
llama.cppfor CPU-only inference, with GGUF conversion tools available.