mmnga/ELYZA-japanese-Llama-2-7b-fast-instruct-AWQ-calib-ja-100k

TEXT GENERATIONConcurrency Cost:1Model Size:1BQuant:BF16Ctx Length:32kLicense:llama2Architecture:Transformer0.0K Open Weights Cold

The mmnga/ELYZA-japanese-Llama-2-7b-fast-instruct-AWQ-calib-ja-100k model is an AWQ quantized version of ELYZA's 7 billion parameter Llama 2-based instruction-tuned model, specifically optimized for Japanese language tasks. This model was created by mmnga using a Japanese calibration set derived from 100k random samples of Wikipedia data and 200 input/output pairs from ELYZA-tasks-100. Its primary differentiation lies in its AWQ quantization with a Japanese-specific calibration set, aiming to preserve important weights for improved performance in Japanese language generation while reducing model size.

Loading preview...

Overview

This model, mmnga/ELYZA-japanese-Llama-2-7b-fast-instruct-AWQ-calib-ja-100k, is an AWQ (Activation-aware Weight Quantization) version of the ELYZA-japanese-Llama-2-7b-fast-instruct model. It is based on the Llama 2 architecture and has been instruction-tuned for Japanese language tasks.

Key Differentiators

  • Japanese-specific AWQ Quantization: Unlike standard AWQ models, this version utilizes a calibration set specifically curated from Japanese data. This set includes 100,000 random samples from izumi-lab/wikipedia-ja-20230720 and approximately 200 input/output pairs from ELYZA-tasks-100.
  • Optimized for Japanese Performance: The use of a Japanese calibration set aims to detect and protect critical weights during quantization, potentially leading to better performance for Japanese language generation compared to models quantized with generic or English-centric calibration data.
  • Reduced Resource Footprint: As an AWQ quantized model, it offers a smaller memory footprint and faster inference times, making it more accessible for deployment on resource-constrained environments.

Usage Considerations

  • Hardware Requirements: The model's usage examples indicate that it currently operates on A100 GPUs when used with Google Colab.
  • Comparison: A GPTQ quantized counterpart, mmnga/ELYZA-japanese-Llama-2-7b-fast-instruct-GPTQ-calib-ja-2k, is also available for comparison, offering different quantization trade-offs.