HaroldB/LLama-2-7B

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kArchitecture:Transformer Cold

HaroldB/LLama-2-7B is a 7 billion parameter language model based on the Llama 2 architecture, fine-tuned using a 4-bit quantization configuration. This model is optimized for efficient deployment and inference, leveraging bitsandbytes for reduced memory footprint. It is suitable for general-purpose language tasks where computational resources are a consideration.

Loading preview...

HaroldB/LLama-2-7B Overview

This model is a 7 billion parameter variant of the Llama 2 architecture, specifically fine-tuned with a focus on efficient resource utilization. It employs bitsandbytes quantization during its training process, utilizing a bnb_4bit_quant_type of nf4 and bnb_4bit_compute_dtype set to float16.

Key Characteristics

  • Architecture: Llama 2 base model.
  • Parameter Count: 7 billion parameters.
  • Quantization: Trained with 4-bit quantization (load_in_4bit: True), enabling reduced memory consumption during inference.
  • Context Length: Supports a context window of 4096 tokens.
  • Framework: Utilizes PEFT (Parameter-Efficient Fine-Tuning) version 0.5.0.dev0.

Good For

  • Applications requiring a Llama 2-based model with a smaller memory footprint.
  • General language generation and understanding tasks where computational efficiency is prioritized.
  • Environments with limited GPU memory, benefiting from 4-bit quantization.