RedHatAI/Qwen2-1.5B-Instruct-FP8
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Jun 14, 2024License:apache-2.0Architecture:Transformer Open Weights Loading

RedHatAI/Qwen2-1.5B-Instruct-FP8 is a 1.5 billion parameter Qwen2-based instruction-tuned causal language model developed by Neural Magic. This model is an FP8 quantized version of Qwen2-1.5B-Instruct, optimized for reduced memory footprint and faster inference with vLLM. It maintains 98.93% of the original model's average performance on the OpenLLM benchmark, making it suitable for English-language assistant-like chat applications.

Loading preview...

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p