lvkaokao/llama2-7b-hf-chat-lora

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kArchitecture:Transformer Cold

The lvkaokao/llama2-7b-hf-chat-lora model is a 7 billion parameter Llama 2-based language model. It was fine-tuned using LoRA with 4-bit quantization (nf4) and double quantization for efficient deployment. This model is designed for chat applications, leveraging the Llama 2 architecture for conversational tasks.

Loading preview...

Model Overview

The lvkaokao/llama2-7b-hf-chat-lora is a 7 billion parameter language model built upon the Llama 2 architecture. This model has been fine-tuned specifically for chat-based applications, making it suitable for conversational AI tasks.

Key Technical Details

  • Base Model: Llama 2
  • Parameter Count: 7 billion
  • Context Length: 4096 tokens
  • Fine-tuning Method: LoRA (Low-Rank Adaptation)
  • Quantization: Utilizes bitsandbytes for 4-bit quantization (nf4 type) with double quantization enabled, and bfloat16 compute dtype. This configuration aims to reduce memory footprint and improve inference efficiency while maintaining performance.
  • Framework: PEFT (Parameter-Efficient Fine-Tuning) version 0.4.0 was used during the training process.

Intended Use Cases

This model is primarily intended for:

  • Developing conversational agents and chatbots.
  • Applications requiring efficient, Llama 2-based chat capabilities.
  • Scenarios where reduced memory usage during inference is critical due to 4-bit quantization.