Qwen/Qwen2-72B
TEXT GENERATIONConcurrency Cost:4Model Size:72.7BQuant:FP8Ctx Length:32kPublished:May 22, 2024License:tongyi-qianwenArchitecture:Transformer0.2K Warm
Qwen2-72B is a 72.7 billion parameter dense decoder-only Transformer language model developed by Qwen. It features SwiGLU activation, attention QKV bias, and group query attention, with an improved tokenizer for multiple natural languages and code. This base model demonstrates strong performance across language understanding, generation, multilingual tasks, coding, mathematics, and reasoning benchmarks, often surpassing other open-source models and competing with proprietary alternatives.
Loading preview...
Popular Sampler Settings
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.
temperature
top_p
top_k
–
frequency_penalty
presence_penalty
repetition_penalty
min_p