TheBloke/Nous-Hermes-13B-SuperHOT-8K-fp16

TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kPublished:Jun 26, 2023License:otherArchitecture:Transformer0.0K Cold

TheBloke/Nous-Hermes-13B-SuperHOT-8K-fp16 is a 13 billion parameter language model, a merge of NousResearch's Nous-Hermes-13B and Kaio Ken's SuperHOT 8K LoRA. This model is specifically enhanced for an extended context length of 8192 tokens, making it suitable for tasks requiring processing longer inputs and generating comprehensive responses. It leverages a Llama architecture fine-tuned on over 300,000 instructions, primarily from synthetic GPT-4 outputs, and is optimized for general language tasks with a focus on reduced hallucination and uncensored output.

Loading preview...

Overview

This model, TheBloke/Nous-Hermes-13B-SuperHOT-8K-fp16, is a 13 billion parameter language model built upon NousResearch's Nous-Hermes-13B, further enhanced by merging with Kaio Ken's SuperHOT 8K LoRA. The primary differentiator is its significantly extended context window, supporting up to 8192 tokens, which allows for processing and generating much longer sequences of text. The base Nous-Hermes-13B model was fine-tuned by Nous Research on over 300,000 instructions, predominantly derived from synthetic GPT-4 outputs, across a diverse range of datasets including GPTeacher, CodeAlpaca, and Airoboros.

Key Capabilities

  • Extended Context Handling: Achieves an 8K context length, enabling the model to maintain coherence and leverage information over longer conversations or documents.
  • Instruction Following: Fine-tuned on a vast instruction dataset, leading to strong performance in understanding and executing complex instructions.
  • Reduced Hallucination: Designed to produce more factual and less hallucinatory outputs compared to some alternatives.
  • Uncensored Output: Lacks OpenAI's censorship mechanisms, offering more flexibility in response generation.
  • Strong General Performance: Benchmarks indicate competitive performance against models like GPT-3.5-turbo across various tasks, including ARC-c, ARC-e, Hellaswag, and OpenBookQA.

Good For

  • Applications requiring processing or generating long texts, such as summarization of lengthy documents, detailed content creation, or extended conversational AI.
  • Developers seeking a powerful 13B parameter model with an expanded context window for general language understanding and generation tasks.
  • Use cases where adherence to specific instructions and a low hallucination rate are critical.
  • Scenarios where an uncensored output is preferred or necessary.