MRNH/llama-2-13b-chat-hf

TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kArchitecture:Transformer Cold

MRNH/llama-2-13b-chat-hf is a 13 billion parameter Llama 2-based conversational language model developed by MRNH. This model is specifically fine-tuned for chat applications, leveraging a 4096-token context length. Its training incorporated 4-bit quantization using bitsandbytes, making it efficient for deployment while maintaining performance for interactive dialogue.

Loading preview...

Model Overview

MRNH/llama-2-13b-chat-hf is a 13 billion parameter language model built upon the Llama 2 architecture, specifically designed for chat-based interactions. It supports a context length of 4096 tokens, enabling it to handle moderately long conversational turns.

Training Details

This model was trained using bitsandbytes 4-bit quantization, specifically employing the nf4 quantization type with float16 compute dtype. This approach allows for efficient memory usage during training and inference, making it suitable for environments with resource constraints. The training process utilized PEFT version 0.5.0.

Key Characteristics

  • Base Model: Llama 2
  • Parameter Count: 13 billion
  • Context Window: 4096 tokens
  • Quantization: Trained with bitsandbytes 4-bit quantization (nf4 type, float16 compute dtype)

Use Cases

This model is well-suited for applications requiring:

  • Conversational AI: Engaging in dialogue, answering questions, and generating human-like text in a chat format.
  • Resource-Efficient Deployment: Its 4-bit quantization makes it a candidate for deployment on hardware with limited memory, while still offering the capabilities of a 13B parameter model.