nikinetrahutama/afx-ai-llama-chat-model-7

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kArchitecture:Transformer Cold

The nikinetrahutama/afx-ai-llama-chat-model-7 is a 7 billion parameter Llama-based chat model developed by nikinetrahutama. It was trained using 4-bit quantization with the bitsandbytes library, specifically utilizing nf4 quantization and double quantization for efficient deployment. This model is designed for conversational AI applications, leveraging its Llama architecture for general-purpose chat interactions.

Loading preview...

Model Overview

The nikinetrahutama/afx-ai-llama-chat-model-7 is a 7 billion parameter Llama-based chat model. Developed by nikinetrahutama, this model is specifically designed for conversational AI tasks, offering a balance between performance and computational efficiency.

Training Details

The model was trained using advanced quantization techniques to optimize its footprint and inference speed. Key aspects of its training procedure include:

  • Quantization Method: bitsandbytes was employed for quantization.
  • Quantization Type: It utilizes load_in_4bit: True with bnb_4bit_quant_type: nf4.
  • Double Quantization: bnb_4bit_use_double_quant: True was enabled, further reducing memory usage.
  • Compute Data Type: Training leveraged bfloat16 for computations.

These configurations indicate a focus on making the model efficient for deployment while maintaining its conversational capabilities. The training process also incorporated PEFT (Parameter-Efficient Fine-Tuning) version 0.5.0.dev0.

Intended Use

This model is suitable for various chat-based applications where a Llama-architecture foundation is desired, particularly in scenarios where efficient resource utilization through 4-bit quantization is beneficial.