Maxtra/llama-2-7b-frestival

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kArchitecture:Transformer Cold

Maxtra/llama-2-7b-frestival is a Llama-2-7b-based language model developed by Maxtra. This model was trained using 4-bit quantization with the nf4 quantization type and float16 compute dtype. It leverages PEFT 0.4.0 for efficient fine-tuning. Its primary application is in scenarios requiring a Llama-2-7b base model with specific quantization configurations.

Loading preview...

Maxtra/llama-2-7b-frestival Overview

This model is a variant of the Llama-2-7b architecture, developed by Maxtra. It has been fine-tuned with a specific quantization configuration to optimize its performance and resource usage. The training process utilized bitsandbytes for 4-bit quantization, specifically employing the nf4 quantization type and float16 for compute dtype.

Key Training Details

  • Quantization: The model was trained with load_in_4bit: True and bnb_4bit_quant_type: nf4.
  • Compute Dtype: bnb_4bit_compute_dtype was set to float16.
  • Framework: PEFT version 0.4.0 was used during the training procedure.

Potential Use Cases

This model is suitable for developers looking to leverage a Llama-2-7b base model with pre-applied 4-bit quantization, which can be beneficial for:

  • Resource-constrained environments: The 4-bit quantization can reduce memory footprint.
  • Efficient deployment: Optimized for faster inference on compatible hardware.
  • Further fine-tuning: Provides a quantized base for additional domain-specific adaptations.