TheBloke/CAMEL-13B-Role-Playing-Data-SuperHOT-8K-fp16

TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kPublished:Jun 27, 2023License:otherArchitecture:Transformer0.0K Cold

TheBloke/CAMEL-13B-Role-Playing-Data-SuperHOT-8K-fp16 is a 13 billion parameter language model, created by TheBloke, merging Camel AI's CAMEL-13B-Role-Playing-Data with Kaio Ken's SuperHOT 8K. This model is specifically designed for role-playing scenarios and features an extended context window of 8192 tokens. It is optimized for GPU inference in fp16 PyTorch format, making it suitable for applications requiring longer conversational memory.

Loading preview...

Model Overview

This model, TheBloke/CAMEL-13B-Role-Playing-Data-SuperHOT-8K-fp16, is a 13 billion parameter language model. It is a merge of two distinct models:

  • Camel AI's CAMEL-13B-Role-Playing-Data: A chat model fine-tuned on 229K role-playing conversations, achieving an average score of 57.2 on EleutherAI's language model evaluation harness.
  • Kaio Ken's SuperHOT 8K: A prototype model featuring an extended context of 8192 tokens, leveraging a technique described in Kaiokendev's GitHub blog.

Key Capabilities

  • Extended Context Window: Supports an 8192-token context length, significantly enhancing its ability to maintain long-form conversations and understand complex, multi-turn interactions.
  • Role-Playing Specialization: Inherits fine-tuning on extensive role-playing datasets, making it adept at generating contextually appropriate and engaging responses in character-driven scenarios.
  • fp16 PyTorch Format: Provided in fp16 PyTorch format, suitable for GPU inference and further model conversions.

Usage Considerations

  • To fully utilize the 8K context, trust_remote_code=True must be enabled during inference, which automatically sets the scale parameter based on max_position_embeddings.
  • The config.json is pre-set to 8192 sequence length but can be modified if a smaller context is desired.
  • Other quantized versions (GPTQ, GGML) are available for different hardware and inference needs.