nvidia/Llama-3.1-8B-Instruct-FP8
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Aug 29, 2024License:llama3.1Architecture:Transformer0.0K Warm
The nvidia/Llama-3.1-8B-Instruct-FP8 model is an 8 billion parameter instruction-tuned language model, quantized to FP8 precision by NVIDIA using TensorRT Model Optimizer. This model is derived from Meta's Llama 3.1 8B Instruct and is optimized for efficient inference on NVIDIA hardware, offering approximately 1.3x speedup on H100 GPUs. It maintains strong performance across benchmarks like MMLU and GSM8K while significantly reducing memory footprint. This model is suitable for commercial and non-commercial use in applications requiring fast, resource-efficient text generation.
Loading preview...
Popular Sampler Settings
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.
temperature
top_p
–
top_k
–
frequency_penalty
–
presence_penalty
–
repetition_penalty
–
min_p
–