Model Overview
MRNH/llama-2-13b-chat-hf is a 13 billion parameter language model built upon the Llama 2 architecture, specifically designed for chat-based interactions. It supports a context length of 4096 tokens, enabling it to handle moderately long conversational turns.
Training Details
This model was trained using bitsandbytes 4-bit quantization, specifically employing the nf4 quantization type with float16 compute dtype. This approach allows for efficient memory usage during training and inference, making it suitable for environments with resource constraints. The training process utilized PEFT version 0.5.0.
Key Characteristics
- Base Model: Llama 2
- Parameter Count: 13 billion
- Context Window: 4096 tokens
- Quantization: Trained with
bitsandbytes 4-bit quantization (nf4 type, float16 compute dtype)
Use Cases
This model is well-suited for applications requiring:
- Conversational AI: Engaging in dialogue, answering questions, and generating human-like text in a chat format.
- Resource-Efficient Deployment: Its 4-bit quantization makes it a candidate for deployment on hardware with limited memory, while still offering the capabilities of a 13B parameter model.