TheBloke/WizardLM-7B-V1-0-Uncensored-SuperHOT-8K-fp16
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kLicense:otherArchitecture:Transformer0.0K Cold

TheBloke/WizardLM-7B-V1-0-Uncensored-SuperHOT-8K-fp16 is a 7 billion parameter language model, a merge of Eric Hartford's WizardLM-7B-V1.0-Uncensored and Kaio Ken's SuperHOT 8K. This model is designed for uncensored responses and features an extended context length of 8192 tokens, making it suitable for applications requiring longer conversational memory or detailed text processing. It is provided in fp16 PyTorch format for GPU inference and further conversions.

Loading preview...

Model Overview

This model, TheBloke/WizardLM-7B-V1-0-Uncensored-SuperHOT-8K-fp16, is a 7 billion parameter language model resulting from a merge of two distinct models:

  • Eric Hartford's WizardLM-7B-V1.0-Uncensored: A retraining of WizardLM-7B-V1.0 with a filtered dataset to reduce refusals, avoidance, and bias, providing a more compliant, uncensored response style. It uses Vicuna-1.1 style prompts.
  • Kaio Ken's SuperHOT 8K: A prototype LoRA (Low-Rank Adaptation) focused on NSFW content, featuring an extended 8K context length and no RLHF (Reinforcement Learning from Human Feedback).

Key Capabilities & Features

  • Extended Context Window: Achieves an 8192-token context length, significantly enhancing its ability to handle longer inputs and maintain conversational coherence over extended interactions. This is enabled by the SuperHOT 8K merge and specific configuration settings.
  • Uncensored Responses: Designed to reduce inherent model biases and refusals, offering more direct and unfiltered outputs compared to standard WizardLM versions.
  • PyTorch fp16 Format: Provided in a format suitable for GPU inference and further model conversions or fine-tuning.
  • Prompt Style: Utilizes Vicuna-1.1 style prompts, formatted as USER: <prompt>\nASSISTANT:.

Usage Considerations

  • Context Extension: To fully utilize the 8K context, trust_remote_code=True is required during model loading, which automatically sets the scale parameter based on max_position_embeddings.
  • Responsibility: As an uncensored model, users are responsible for the content generated and its implications.

Good for

  • Applications requiring extended conversational memory or processing of long documents.
  • Use cases where unfiltered or less biased responses are preferred.
  • Developers looking for a 7B parameter model with enhanced context for experimentation or specific content generation tasks.