Mrw33554432/bitLinear-phi-1.5
Mrw33554432/bitLinear-phi-1.5 is a 1.4 billion parameter causal language model based on the phi-1.5 architecture, partially trained using the 1-bit quantization method described in "The Era of 1-bit LLMs." This model specifically applies 1-bit quantization to weights in its linear layers, excluding other components, to evaluate the impact of binary weight quantization. It was trained on a subset of the Wikipedia dataset for research validation, focusing on exploring efficient model architectures.
Loading preview...
Overview
Mrw33554432/bitLinear-phi-1.5 is a 1.4 billion parameter language model built upon the phi-1.5 architecture. Its core innovation lies in the partial implementation of the BitLinear quantization method, specifically applying 1-bit quantization to the weights of its linear layers (excluding the lm_head). This approach aims to isolate and evaluate the effectiveness of binary weight quantization as described in the paper "The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits" (arXiv:2402.17764), without incorporating other components like RMSnorm or activation quantization.
Key Characteristics
- Architecture: Based on Microsoft's phi-1.5, with custom BitLinear layers replacing standard linear layers.
- Quantization: Implements 1-bit quantization for weights in most linear layers, focusing on efficiency research.
- Training Data: Trained on a small subset (100,000 samples) of the English Wikipedia dataset for research validation.
- Performance Note: The current kernel is not optimized for 1-bit matrix operations, leading to slower inference. Faster inference (3x) is possible with a custom kernel available on the project's GitHub.
Research Focus
This model serves as a research vehicle to understand the implications and performance of 1-bit weight quantization in LLMs. It highlights the potential for reduced memory footprint and computational cost, though current inference speed is limited by unoptimized kernels. Developers interested in exploring efficient model architectures and quantization techniques will find this model particularly relevant.