Overview
digitalpipelines/llama2_7b_chat_uncensored is a 7 billion parameter language model developed by digitalpipelines. It is a fine-tuned version of the OpenLLaMA-7B base model, specifically adapted for conversational tasks. The fine-tuning process utilized a unique uncensored/unfiltered Wizard-Vicuna conversation dataset (digitalpipelines/wizard_vicuna_70k_uncensored) and employed the QLoRA method, as detailed in a process outlined by George Sung.
Key Characteristics
- Base Model: OpenLLaMA-7B
- Parameter Count: 7 billion
- Fine-tuning Method: QLoRA
- Training Data: Uncensored Wizard-Vicuna conversation dataset
- Prompt Format: Uses a specific
### HUMAN: and ### RESPONSE: structure for turn-based conversations.
Quantized Versions Available
For optimized inference, digitalpipelines provides several quantized versions of this model:
- GPTQ: Available as
digitalpipelines/llama2_7b_chat_uncensored-GPTQ - GGML: Includes 2, 3, 4, 5, 6, and 8-bit quantized models for CPU+GPU inference, found under
digitalpipelines/llama2_7b_chat_uncensored-GGML.
Intended Use
This model is particularly suited for applications where an unfiltered and direct conversational style is desired, distinguishing it from models with built-in censorship or strict content moderation. Its training on an uncensored dataset means it will generate responses without the typical guardrails found in many other chat-optimized LLMs.