RedHatAI/Qwen2-72B-Instruct-FP8
TEXT GENERATIONConcurrency Cost:4Model Size:72.7BQuant:FP8Ctx Length:32kPublished:Jun 6, 2024License:tongyi-qianwenArchitecture:Transformer0.0K Cold

RedHatAI/Qwen2-72B-Instruct-FP8 is a 72.7 billion parameter Qwen2-based instruction-tuned causal language model developed by Neural Magic. This model is a quantized version of Qwen2-72B-Instruct, optimized with FP8 weight and activation quantization to reduce memory footprint and improve inference efficiency. It is intended for commercial and research use in English for assistant-like chat applications. The FP8 quantization achieves an average OpenLLM benchmark score of 80.34, slightly surpassing the unquantized model's 79.97.

Loading preview...