lvkaokao/llama2-7b-hf-chat-lora
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kArchitecture:Transformer Cold
The lvkaokao/llama2-7b-hf-chat-lora model is a 7 billion parameter Llama 2-based language model. It was fine-tuned using LoRA with 4-bit quantization (nf4) and double quantization for efficient deployment. This model is designed for chat applications, leveraging the Llama 2 architecture for conversational tasks.
Loading preview...
Model Overview
The lvkaokao/llama2-7b-hf-chat-lora is a 7 billion parameter language model built upon the Llama 2 architecture. This model has been fine-tuned specifically for chat-based applications, making it suitable for conversational AI tasks.
Key Technical Details
- Base Model: Llama 2
- Parameter Count: 7 billion
- Context Length: 4096 tokens
- Fine-tuning Method: LoRA (Low-Rank Adaptation)
- Quantization: Utilizes
bitsandbytesfor 4-bit quantization (nf4type) with double quantization enabled, andbfloat16compute dtype. This configuration aims to reduce memory footprint and improve inference efficiency while maintaining performance. - Framework: PEFT (Parameter-Efficient Fine-Tuning) version 0.4.0 was used during the training process.
Intended Use Cases
This model is primarily intended for:
- Developing conversational agents and chatbots.
- Applications requiring efficient, Llama 2-based chat capabilities.
- Scenarios where reduced memory usage during inference is critical due to 4-bit quantization.