RedHatAI/Qwen2-1.5B-Instruct-FP8
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Jun 14, 2024License:apache-2.0Architecture:Transformer Open Weights Loading
RedHatAI/Qwen2-1.5B-Instruct-FP8 is a 1.5 billion parameter Qwen2-based instruction-tuned causal language model developed by Neural Magic. This model is an FP8 quantized version of Qwen2-1.5B-Instruct, optimized for reduced memory footprint and faster inference with vLLM. It maintains 98.93% of the original model's average performance on the OpenLLM benchmark, making it suitable for English-language assistant-like chat applications.
Loading preview...
Popular Sampler Settings
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.
temperature
–
top_p
–
top_k
–
frequency_penalty
–
presence_penalty
–
repetition_penalty
–
min_p
–