RedHatAI/Meta-Llama-3-8B-Instruct-FP8
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Apr 25, 2024License:llama3Architecture:Transformer0.0K Warm
The RedHatAI/Meta-Llama-3-8B-Instruct-FP8 model, developed by Neural Magic, is an 8 billion parameter instruction-tuned Llama-3 architecture optimized with FP8 weight and activation quantization. This optimization significantly reduces memory footprint and disk size by approximately 50% compared to its unquantized counterpart. Intended for commercial and research use, it excels in English-language assistant-like chat applications while maintaining strong performance, achieving an average OpenLLM benchmark score of 68.22.
Loading preview...
Popular Sampler Settings
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.
temperature
top_p
–
top_k
–
frequency_penalty
–
presence_penalty
–
repetition_penalty
–
min_p
–