RedHatAI/Meta-Llama-3-8B-Instruct-FP8-KV
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:May 20, 2024Architecture:Transformer0.0K Warm

RedHatAI/Meta-Llama-3-8B-Instruct-FP8-KV is an 8 billion parameter instruction-tuned causal language model, developed by RedHatAI, based on Meta-Llama-3. This model is specifically quantized to FP8 for both weights and activations, and includes FP8 KV Cache, optimizing it for efficient inference with vLLM. It maintains strong performance, achieving 74.98 on gsm8k 5-shot, making it suitable for resource-constrained environments requiring high throughput.

Loading preview...