kstecenko/Tinny-LLAMA2-extractor

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.1BQuant:BF16Ctx Length:2kArchitecture:Transformer Warm

The kstecenko/Tinny-LLAMA2-extractor is a model developed by kstecenko, fine-tuned using 4-bit quantization with the nf4 type and double quantization enabled. This model leverages PEFT 0.6.0.dev0 for efficient adaptation. Its primary characteristic is the application of specific quantization techniques during training, making it suitable for scenarios where reduced memory footprint and computational efficiency are critical.

Loading preview...

Overview

The kstecenko/Tinny-LLAMA2-extractor is a model developed by kstecenko, distinguished by its training procedure which heavily utilizes bitsandbytes 4-bit quantization. This approach is designed to optimize the model's efficiency, likely targeting deployment in resource-constrained environments.

Key Training Details

  • Quantization: The model was trained with load_in_4bit: True, employing the nf4 quantization type and bnb_4bit_use_double_quant: True for enhanced precision within the 4-bit scheme.
  • Compute Data Type: bfloat16 was used as the compute data type during 4-bit quantization, balancing numerical stability with performance.
  • Framework: The training process leveraged PEFT (Parameter-Efficient Fine-Tuning) version 0.6.0.dev0, indicating a focus on efficient adaptation rather than full model retraining.

Potential Use Cases

  • Resource-constrained environments: The 4-bit quantization makes this model potentially suitable for deployment on devices with limited memory or computational power.
  • Efficient fine-tuning: The use of PEFT suggests it's designed for scenarios where rapid and efficient adaptation to new tasks is desired without extensive computational overhead.