TheBloke/GPT4All-13B-Snoozy-SuperHOT-8K-fp16
TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kPublished:Jun 27, 2023License:otherArchitecture:Transformer0.0K Cold
TheBloke/GPT4All-13B-Snoozy-SuperHOT-8K-fp16 is a 13 billion parameter Llama-based model, developed by Nomic AI and merged with Kaio Ken's SuperHOT 8K LoRA. This fp16 PyTorch model is optimized for GPU inference and features an extended context length of 8192 tokens. It excels in common sense reasoning tasks, demonstrating strong performance across various benchmarks.
Loading preview...
Model Overview
This model, GPT4All-13B-Snoozy-SuperHOT-8K-fp16, is a 13 billion parameter Llama-based language model. It is a merge of Nomic AI's GPT4All Snoozy 13B with Kaio Ken's SuperHOT 8K LoRA, specifically provided in fp16 PyTorch format for GPU inference.
Key Capabilities
- Extended Context Window: Achieves an 8192-token context length during inference by leveraging the SuperHOT 8K merge and
trust_remote_code=Truein Hugging Face Transformers. - Common Sense Reasoning: The base GPT4All 13B Snoozy model shows strong performance on common sense reasoning benchmarks, outperforming several other 7B and 13B models in categories like BoolQ and WinoGrande.
- Instruction Following: Finetuned on a curated corpus of assistant interactions, including multi-turn dialogue, code, poems, and stories, indicating proficiency in instruction-tuned tasks.
Good For
- Applications requiring a larger context window for more coherent and extended interactions.
- Tasks benefiting from strong common sense reasoning capabilities.
- Developers looking for an fp16 PyTorch model for GPU inference, with options for further quantization (GPTQ, GGML) available from TheBloke.