nvidia/Llama-3.1-8B-Instruct-FP8
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Aug 29, 2024License:llama3.1Architecture:Transformer0.0K Warm

The nvidia/Llama-3.1-8B-Instruct-FP8 model is an 8 billion parameter instruction-tuned language model, quantized to FP8 precision by NVIDIA using TensorRT Model Optimizer. This model is derived from Meta's Llama 3.1 8B Instruct and is optimized for efficient inference on NVIDIA hardware, offering approximately 1.3x speedup on H100 GPUs. It maintains strong performance across benchmarks like MMLU and GSM8K while significantly reducing memory footprint. This model is suitable for commercial and non-commercial use in applications requiring fast, resource-efficient text generation.

Loading preview...

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p