PrunaAI/gemma-1.1-2b-it-bnb-8bit-smashed
PrunaAI/gemma-1.1-2b-it-bnb-8bit-smashed is a 2.6 billion parameter instruction-tuned causal language model, based on Google's Gemma 1.1 2B IT, that has been compressed using 8-bit quantization (llm-int8) by PrunaAI. This model is optimized for reduced memory footprint and faster inference, making it suitable for resource-constrained environments. It maintains the core capabilities of the original Gemma 1.1 2B IT while offering efficiency gains for deployment.
Loading preview...
PrunaAI/gemma-1.1-2b-it-bnb-8bit-smashed Overview
This model is a compressed version of Google's Gemma 1.1 2B IT, developed by PrunaAI. The primary focus of this model is to provide a more efficient alternative to the base model by reducing its size and improving inference speed. It achieves this through 8-bit quantization using llm-int8 compression techniques.
Key Capabilities & Features
- 8-bit Quantization: The model has been compressed using
llm-int8, significantly reducing its memory footprint. - Efficiency-focused: Designed to be cheaper, smaller, faster, and greener for AI applications.
- Base Model Fidelity: Aims to retain the core instruction-following capabilities of the original Gemma 1.1 2B IT model.
- Safetensors Format: Utilizes the safetensors format for model weights.
- Calibration Data: WikiText was used as calibration data for the compression process where needed.
Good For
- Resource-Constrained Deployments: Ideal for environments where memory and computational resources are limited.
- Faster Inference: Suitable for applications requiring quicker response times due to optimized inference speed.
- Cost-Effective AI: Offers a more economical solution for running instruction-tuned language models.
- Experimentation with Quantization: Provides a readily available example of an 8-bit quantized Gemma model for developers to test and integrate.