sminchoi/llama-2-7b-chat-hf_guanaco-llama2-1k_230913
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kArchitecture:Transformer Cold

sminchoi/llama-2-7b-chat-hf_guanaco-llama2-1k_230913 is a 7 billion parameter language model based on the Llama 2 architecture, fine-tuned using the Guanaco dataset. This model is designed for chat-based applications, leveraging a 4096-token context window. Its training incorporated 4-bit quantization techniques, making it suitable for efficient deployment in conversational AI systems.

Loading preview...

Model Overview

sminchoi/llama-2-7b-chat-hf_guanaco-llama2-1k_230913 is a 7 billion parameter model built upon the Llama 2 architecture. It has been fine-tuned specifically for chat applications, utilizing the Guanaco dataset for conversational capabilities. The model supports a context length of 4096 tokens, enabling it to handle moderately long dialogues.

Training Details

The training process for this model incorporated bitsandbytes 4-bit quantization. Key quantization parameters included:

  • load_in_4bit: True
  • bnb_4bit_quant_type: nf4
  • bnb_4bit_use_double_quant: False
  • bnb_4bit_compute_dtype: float16

This quantization strategy aims to reduce memory footprint and improve inference efficiency while maintaining performance. The training also utilized PEFT (Parameter-Efficient Fine-Tuning) version 0.4.0.

Potential Use Cases

Given its chat-oriented fine-tuning and efficient quantization, this model is well-suited for:

  • Conversational AI: Developing chatbots and virtual assistants.
  • Interactive applications: Powering dialogue systems where efficient inference is beneficial.
  • Resource-constrained environments: Deploying on hardware with limited memory due to its 4-bit quantization.