TheBloke/Wizard-Vicuna-7B-Uncensored-SuperHOT-8K-fp16
TheBloke/Wizard-Vicuna-7B-Uncensored-SuperHOT-8K-fp16 is a 7 billion parameter causal language model developed by TheBloke, merging Eric Hartford's Wizard Vicuna 7B Uncensored with Kaio Ken's SuperHOT 8K. This fp16 PyTorch model is designed for GPU inference, featuring an extended context length of 8192 tokens. It is specifically fine-tuned to provide uncensored responses and excels in scenarios requiring longer context understanding.
Loading preview...
Model Overview
This model, TheBloke/Wizard-Vicuna-7B-Uncensored-SuperHOT-8K-fp16, is a 7 billion parameter language model in fp16 PyTorch format. It is a merge of Eric Hartford's Wizard Vicuna 7B Uncensored and Kaio Ken's SuperHOT 8K.
Key Capabilities
- Extended Context Window: Achieves an 8192-token context length during inference by utilizing
trust_remote_code=Trueand a modifiedconfig.json. - Uncensored Responses: Based on Eric Hartford's Wizard Vicuna 7B Uncensored, which was trained with alignment/moralizing responses removed, allowing for unconstrained output.
- GPU Inference: Provided in fp16 PyTorch format, optimized for direct use on GPUs.
- SuperHOT Integration: Incorporates Kaio Ken's SuperHOT 7b LoRA, which was trained with a focus on NSFW content and extended context capabilities.
Good For
- Applications requiring long-context understanding and generation.
- Use cases where uncensored or unfiltered responses are desired.
- Developers looking for a base model for further fine-tuning or experimentation with custom alignment.
- Scenarios benefiting from the SuperHOT LoRA's specific training focus.
For CPU inference or different quantization levels, refer to the available 4-bit GPTQ models and 2, 3, 4, 5, 6 and 8-bit GGML models.