Name: PKU-DS-LAB/Fairy2i-W2 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: PKU-DS-LAB

Overview

PKU-DS-LAB's Fairy2i-W2 is a 7 billion parameter language model built upon the LLaMA-2 architecture, distinguished by its innovative approach to extreme low-bit quantization. It introduces Fairy2i, a universal framework that converts pre-trained real-valued layers into an equivalent widely-linear complex form, allowing for highly efficient 2-bit quantization without retraining from scratch.

Key Capabilities & Innovations

Lossless Widely-Linear Transformation: Converts real-valued linear layers into complex form while preserving original model behavior before quantization.
Phase-Aware Complex Quantization: Utilizes a unique codebook of fourth roots of unity ({±1, ±i}) for quantizing complex weights, maintaining full-precision master weights during Quantization-Aware Training (QAT).
Recursive Residual Quantization: Employs a two-stage recursive mechanism to iteratively minimize quantization error, achieving an effective 2 bits per real parameter for Fairy2i-W2.
Performance: On LLaMA-2 7B, Fairy2i-W2 (2-bit) achieves a perplexity of 7.85 and an average zero-shot accuracy of 62.00%, closely matching FP16 performance (6.63 perplexity, 64.72% accuracy) and significantly outperforming other 2-bit real-valued quantization methods like AQLM and QuIP#.

When to Use This Model

Fairy2i-W2 is ideal for scenarios requiring highly efficient inference of large language models on resource-constrained hardware. Its ability to achieve near full-precision performance at an effective 2-bit precision makes it suitable for deploying LLaMA-2 7B in environments where memory and computational demands are critical. It bridges the gap between the efficiency of complex-valued arithmetic and the practical utility of existing pre-trained real-valued models.

Overview

Overview

Key Capabilities & Innovations

When to Use This Model

Full Model Card (README)