Qwen/Qwen2.5-7B-Instruct-1M
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Jan 23, 2025License:apache-2.0Architecture:Transformer0.4KOpen Weights Warm

Qwen2.5-7B-Instruct-1M is a 7.61 billion parameter instruction-tuned causal language model developed by Qwen, featuring a transformer architecture with RoPE, SwiGLU, RMSNorm, and Attention QKV bias. This model is specifically optimized for ultra-long context tasks, supporting an impressive context length of up to 1 million tokens while maintaining strong performance on shorter tasks. It is designed for efficient processing of extensive text sequences, leveraging sparse attention and length extrapolation methods for enhanced accuracy and speed.

Loading preview...

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p