TheBloke/WizardLM-13B-V1-1-SuperHOT-8K-fp16
TheBloke/WizardLM-13B-V1-1-SuperHOT-8K-fp16 is a 13 billion parameter causal language model, a merged version of WizardLM's WizardLM 13B V1.1 and Kaio Ken's SuperHOT 8K LoRA. This model is specifically enhanced for an extended context length of 8192 tokens, making it suitable for applications requiring longer conversational memory or processing extensive documents. It is provided in fp16 PyTorch format for GPU inference and further conversions, with a focus on NSFW-oriented tasks due to its SuperHOT merge.
Loading preview...
Model Overview
This model, TheBloke/WizardLM-13B-V1-1-SuperHOT-8K-fp16, is a 13 billion parameter language model derived from a merge of WizardLM's WizardLM 13B V1.1 and Kaio Ken's SuperHOT 8K LoRA. It is provided in fp16 PyTorch format, optimized for GPU inference.
Key Capabilities & Features
- Extended Context Window: Achieves an 8K (8192 token) context length, significantly larger than the base WizardLM's 4096 tokens, enabled by the SuperHOT 8K merge and specific configuration settings (
trust_remote_code=True). - WizardLM Base: Built upon WizardLM-13B V1.1, which has demonstrated strong performance on benchmarks like MT-Bench (6.74), AlpacaEval (86.32%), and WizardLM Eval (99.3%).
- SuperHOT Merge: Incorporates Kaio Ken's SuperHOT LoRA, which is described as a NSFW-focused prototype, suggesting potential specialization in generating content for such applications.
- Flexible Deployment: Available in various formats including 4-bit GPTQ for GPU, 2-8 bit GGML for CPU, and this unquantized fp16 PyTorch version for high-fidelity inference and further model conversions.
Usage Considerations
- Context Handling: Requires
trust_remote_code=Truein Hugging Face Transformers to properly utilize the 8K context window, which automatically sets thescaleparameter for position embeddings. - Monkey Patch: A
llama_rope_scaled_monkey_patch.pyis provided for environments that do not natively support the required scaling, allowing for manual application of the scaling factor. - Oobabooga Compatibility: Specific arguments (
--max_seq_len 8192 --compress_pos_emb 4) are recommended for optimal performance with Oobabooga's Exllama loader.