Name: mmnga/ELYZA-japanese-Llama-2-7b-fast-instruct-AWQ-calib-ja-100k API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: mmnga

Overview

This model, mmnga/ELYZA-japanese-Llama-2-7b-fast-instruct-AWQ-calib-ja-100k, is an AWQ (Activation-aware Weight Quantization) version of the ELYZA-japanese-Llama-2-7b-fast-instruct model. It is based on the Llama 2 architecture and has been instruction-tuned for Japanese language tasks.

Key Differentiators

Japanese-specific AWQ Quantization: Unlike standard AWQ models, this version utilizes a calibration set specifically curated from Japanese data. This set includes 100,000 random samples from izumi-lab/wikipedia-ja-20230720 and approximately 200 input/output pairs from ELYZA-tasks-100.
Optimized for Japanese Performance: The use of a Japanese calibration set aims to detect and protect critical weights during quantization, potentially leading to better performance for Japanese language generation compared to models quantized with generic or English-centric calibration data.
Reduced Resource Footprint: As an AWQ quantized model, it offers a smaller memory footprint and faster inference times, making it more accessible for deployment on resource-constrained environments.

Usage Considerations

Hardware Requirements: The model's usage examples indicate that it currently operates on A100 GPUs when used with Google Colab.
Comparison: A GPTQ quantized counterpart, mmnga/ELYZA-japanese-Llama-2-7b-fast-instruct-GPTQ-calib-ja-2k, is also available for comparison, offering different quantization trade-offs.

Overview

Overview

Key Differentiators

Usage Considerations

Full Model Card (README)