HaroldB/LLama-2-7B
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kArchitecture:Transformer Cold
HaroldB/LLama-2-7B is a 7 billion parameter language model based on the Llama 2 architecture, fine-tuned using a 4-bit quantization configuration. This model is optimized for efficient deployment and inference, leveraging bitsandbytes for reduced memory footprint. It is suitable for general-purpose language tasks where computational resources are a consideration.
Loading preview...
HaroldB/LLama-2-7B Overview
This model is a 7 billion parameter variant of the Llama 2 architecture, specifically fine-tuned with a focus on efficient resource utilization. It employs bitsandbytes quantization during its training process, utilizing a bnb_4bit_quant_type of nf4 and bnb_4bit_compute_dtype set to float16.
Key Characteristics
- Architecture: Llama 2 base model.
- Parameter Count: 7 billion parameters.
- Quantization: Trained with 4-bit quantization (
load_in_4bit: True), enabling reduced memory consumption during inference. - Context Length: Supports a context window of 4096 tokens.
- Framework: Utilizes PEFT (Parameter-Efficient Fine-Tuning) version 0.5.0.dev0.
Good For
- Applications requiring a Llama 2-based model with a smaller memory footprint.
- General language generation and understanding tasks where computational efficiency is prioritized.
- Environments with limited GPU memory, benefiting from 4-bit quantization.