RedHatAI/gemma-2-9b-it-FP8
TEXT GENERATIONConcurrency Cost:1Model Size:9BQuant:FP8Ctx Length:16kPublished:Jul 8, 2024License:gemmaArchitecture:Transformer0.0K Warm

RedHatAI/gemma-2-9b-it-FP8 is a 9 billion parameter Gemma 2 model developed by Neural Magic (Red Hat), optimized with FP8 weight and activation quantization. This model is a quantized version of google/gemma-2-9b-it, designed for efficient inference in assistant-like chat applications. It achieves an average OpenLLM benchmark score of 73.49, slightly outperforming its unquantized counterpart while significantly reducing memory footprint.

Loading preview...