Name: arnavgrg/llama-2-13b-chat-nf4-fp16-upscaled API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: arnavgrg

Model Overview

This model, arnavgrg/llama-2-13b-chat-nf4-fp16-upscaled, is a specialized variant of Meta's Llama-2-13b-chat base model. It has undergone a unique processing pipeline involving NF4 4-bit quantization followed by upscaling its linear layers to FP16.

Key Characteristics

Quantization Strategy: Initially quantized using NF4 (NormalFloat 4-bit) via bitsandbytes.
Upscaled Precision: The 4-bit linear layers are subsequently upscaled to FP16. This approach aims to mitigate the performance cost associated with on-the-fly quantization/dequantization during inference.
Lossy Conversion: It's important to note that the initial NF4 quantization is a lossy operation. Consequently, this model's performance will not be identical to the original, unquantized Llama-2-13b-chat base model.

Usage

This model can be loaded and utilized directly with the transformers library, specifying torch_dtype=torch.float16 for optimal use of its upscaled precision.

Overview

Model Overview

Key Characteristics

Usage

Full Model Card (README)