Name: PrunaAI/gemma-1.1-2b-it-bnb-8bit-smashed API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: PrunaAI

PrunaAI/gemma-1.1-2b-it-bnb-8bit-smashed Overview

This model is a compressed version of Google's Gemma 1.1 2B IT, developed by PrunaAI. The primary focus of this model is to provide a more efficient alternative to the base model by reducing its size and improving inference speed. It achieves this through 8-bit quantization using llm-int8 compression techniques.

Key Capabilities & Features

8-bit Quantization: The model has been compressed using llm-int8, significantly reducing its memory footprint.
Efficiency-focused: Designed to be cheaper, smaller, faster, and greener for AI applications.
Base Model Fidelity: Aims to retain the core instruction-following capabilities of the original Gemma 1.1 2B IT model.
Safetensors Format: Utilizes the safetensors format for model weights.
Calibration Data: WikiText was used as calibration data for the compression process where needed.

Good For

Resource-Constrained Deployments: Ideal for environments where memory and computational resources are limited.
Faster Inference: Suitable for applications requiring quicker response times due to optimized inference speed.
Cost-Effective AI: Offers a more economical solution for running instruction-tuned language models.
Experimentation with Quantization: Provides a readily available example of an 8-bit quantized Gemma model for developers to test and integrate.

Overview

PrunaAI/gemma-1.1-2b-it-bnb-8bit-smashed Overview

Key Capabilities & Features

Good For

Full Model Card (README)