Name: arnavgrg/mistral-7b-nf4-fp16-upscaled API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: arnavgrg

arnavgrg/mistral-7b-nf4-fp16-upscaled: Optimized Mistral-7B Variant

This model is an FP16 (floating-point 16-bit) version of the original Mistral-7B base model, developed by arnavgrg. It has undergone a specific optimization process where the model was initially loaded with NF4 4-bit quantization via bitsandbytes, and subsequently, its linear4bit layers were upscaled to FP16.

Key Characteristics

Upscaled FP16 Variant: The primary feature is the upscaling of linear4bit layers to FP16 after initial NF4 quantization.
Inference Cost Reduction: This upscaling aims to minimize the computational overhead associated with quantization and dequantization during each forward pass at inference time.
Lossy Quantization: It's important to note that the initial NF4 quantization is a lossy operation, meaning the model's weights for the linear layers are not perfectly preserved. Consequently, this variant may not perform identically to the official, unquantized Mistral-7B base model.

Usage

This model can be loaded directly using the transformers library in FP16, facilitating straightforward integration into existing workflows:

import torch
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
  "arnavgrg/mistral-7b-nf4-fp16-upscaled",
  device_map="auto",
  torch_dtype=torch.float16,
)

When to Use This Model

This model is particularly suited for scenarios where:

Faster Inference is Critical: The FP16 upscaling helps reduce the computational cost during inference.
Resource-Constrained Environments: While not explicitly stated, FP16 models generally offer better performance on hardware optimized for half-precision arithmetic.
Acceptable Fidelity Trade-off: Users are willing to accept a minor reduction in model performance compared to the full-precision base model in exchange for speed benefits.

Overview

arnavgrg/mistral-7b-nf4-fp16-upscaled: Optimized Mistral-7B Variant

Key Characteristics

Usage

When to Use This Model

Full Model Card (README)