mit-han-lab/Llama-3-8B-QServe-g128
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:May 5, 2024License:llama3Architecture:Transformer Cold

The mit-han-lab/Llama-3-8B-QServe-g128 model is a Llama-3-8B variant developed by mit-han-lab, optimized for efficient serving with 128-group quantization. This model focuses on reducing inference costs and latency while maintaining performance. It is designed for applications requiring high throughput and low-latency responses from a Llama-3-8B base.

Loading preview...