NICKO/phi-4-BonfyreFPQ3
NICKO/phi-4-BonfyreFPQ3 is a 14.7 billion parameter language model, based on the phi-4 architecture, developed by NICKO. This model is distinguished by its BonfyreFPQ compression method, which uses a novel weight algebra and multi-scale encoding to achieve approximately 4 bits per weight while maintaining high per-weight cosine similarity of ~0.9999. It is provided in BF16 safetensors format, allowing for direct loading and use as a drop-in replacement in standard PyTorch or HuggingFace environments. The primary use case for this model is to provide a highly compressed yet performant language model for efficient deployment.
Loading preview...
NICKO/phi-4-BonfyreFPQ3: Highly Compressed Language Model
NICKO/phi-4-BonfyreFPQ3 is a 14.7 billion parameter model utilizing the phi-4 architecture, developed by NICKO. Its core innovation lies in the BonfyreFPQ v9/v10 compression method, which employs a unique Bonfyre Weight Algebra to significantly reduce model size while preserving quality.
Key Compression Details
- Decomposition: Weights are decomposed using truncated SVD (W = L + R).
- Pruning: The 'R' component undergoes hybrid structure-aware pruning.
- Correction: Curl and divergence energy correction are applied.
- Encoding: FPQ v9 multi-scale encoding (LR + E8 + RVQ + QJL + Ghost) is used to achieve high compression.
- Output Format: The model is provided in BF16 safetensors format, designed for direct loading without special loaders, making it a drop-in replacement for standard models.
Quality & Performance
- The compression method achieves approximately 4 bits per weight.
- It maintains a high per-weight cosine similarity of ~0.9999, indicating minimal quality degradation despite significant compression.
- Verified benchmarks are available for review here.
When to Use This Model
This model is particularly suitable for applications requiring:
- Efficient deployment: Its high compression ratio allows for reduced memory footprint and faster loading.
- Resource-constrained environments: Ideal for scenarios where computational or storage resources are limited.
- Standard integration: Its BF16 safetensors format ensures compatibility with existing PyTorch, diffusers, and HuggingFace workflows.