NICKO/phi-4-BonfyreFPQ3

TEXT GENERATIONConcurrency Cost:1Model Size:14.7BQuant:FP8Ctx Length:32kPublished:Apr 12, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

NICKO/phi-4-BonfyreFPQ3 is a 14.7 billion parameter language model, based on the phi-4 architecture, developed by NICKO. This model is distinguished by its BonfyreFPQ compression method, which uses a novel weight algebra and multi-scale encoding to achieve approximately 4 bits per weight while maintaining high per-weight cosine similarity of ~0.9999. It is provided in BF16 safetensors format, allowing for direct loading and use as a drop-in replacement in standard PyTorch or HuggingFace environments. The primary use case for this model is to provide a highly compressed yet performant language model for efficient deployment.

Loading preview...

NICKO/phi-4-BonfyreFPQ3: Highly Compressed Language Model

NICKO/phi-4-BonfyreFPQ3 is a 14.7 billion parameter model utilizing the phi-4 architecture, developed by NICKO. Its core innovation lies in the BonfyreFPQ v9/v10 compression method, which employs a unique Bonfyre Weight Algebra to significantly reduce model size while preserving quality.

Key Compression Details

  • Decomposition: Weights are decomposed using truncated SVD (W = L + R).
  • Pruning: The 'R' component undergoes hybrid structure-aware pruning.
  • Correction: Curl and divergence energy correction are applied.
  • Encoding: FPQ v9 multi-scale encoding (LR + E8 + RVQ + QJL + Ghost) is used to achieve high compression.
  • Output Format: The model is provided in BF16 safetensors format, designed for direct loading without special loaders, making it a drop-in replacement for standard models.

Quality & Performance

  • The compression method achieves approximately 4 bits per weight.
  • It maintains a high per-weight cosine similarity of ~0.9999, indicating minimal quality degradation despite significant compression.
  • Verified benchmarks are available for review here.

When to Use This Model

This model is particularly suitable for applications requiring:

  • Efficient deployment: Its high compression ratio allows for reduced memory footprint and faster loading.
  • Resource-constrained environments: Ideal for scenarios where computational or storage resources are limited.
  • Standard integration: Its BF16 safetensors format ensures compatibility with existing PyTorch, diffusers, and HuggingFace workflows.