arnavgrg/mistral-7b-nf4-fp16-upscaled
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:8kLicense:apache-2.0Architecture:Transformer Open Weights Warm
The arnavgrg/mistral-7b-nf4-fp16-upscaled model is an FP16 variant of the Mistral-7B base model, upscaled after initial NF4 4-bit quantization. This approach aims to reduce inference-time quantization/dequantization costs by converting the linear4bit layers to FP16. While this process involves lossy quantization, it is designed for efficient deployment where FP16 precision is preferred for performance. It is suitable for users seeking a Mistral-7B derivative optimized for faster inference with a slight trade-off in fidelity due to the initial quantization.
Loading preview...
Popular Sampler Settings
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.
temperature
–
top_p
top_k
–
frequency_penalty
presence_penalty
repetition_penalty
min_p