Name: ibm-ai-platform/Bamba-9B-v2 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: ibm-ai-platform

Bamba-9B-v2: A Mamba-2 Architecture Model

Bamba-9B-v2 is a 9.78 billion parameter decoder-only language model from ibm-ai-platform, leveraging the efficient Mamba-2 architecture. This version significantly improves upon Bamba v1, having undergone an additional 1 trillion tokens of training data, bringing its total pretraining to 3.1 trillion tokens.

Key Capabilities & Performance

Enhanced Performance: Bamba-9B-v2 demonstrates strong performance on various benchmarks, with L1 and L2 leaderboard scores outperforming Llama 3.1 8B, which was trained with nearly five times the data.
General Text Generation: Designed to handle a broad spectrum of text generation tasks.
Efficient Inference: Supports efficient inference through its Mamba-2 architecture and offers quantization options (FP8) for further memory and speed optimization, reducing memory usage from 39.12 GB to 10.83 GB.
Benchmark Scores: Achieves 67.92 on MMLU (5-shot), 63.57 on ARC-C (25-shot), and 41.70 on GSM8K (5-shot).

Development & Integration

Training: Trained using FSDP with the official Mamba implementation, with resources available for reproduction.
Hugging Face Integration: Fully integrated with Hugging Face transformers for straightforward inference.
Quantization: Supports FP8 quantization via fms-model-optimizer for reduced memory footprint and faster inference.
llama.cpp Support: Preliminary work is underway to enable Bamba models on llama.cpp for CPU-only inference, with GGUF conversion tools available.

Overview

Bamba-9B-v2: A Mamba-2 Architecture Model

Key Capabilities & Performance

Development & Integration

Full Model Card (README)