IntelLabs/sqft-phi-3-mini-4k-40-base
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:4kLicense:apache-2.0Architecture:Transformer0.0K Open Weights Warm

The IntelLabs/sqft-phi-3-mini-4k-40-base is a 4 billion parameter language model derived from Microsoft's Phi-3-mini-4k-instruct. It incorporates 40% sparsity using the Wanda pruning method, making it a sparse foundation model. This model is designed for efficient deployment and adaptation, focusing on low-cost model adaptation in low-precision sparse environments. Its primary strength lies in its optimized structure for resource-constrained applications.

Loading preview...

Overview

The Intellabs/sqft-phi-3-mini-4k-40-base is a 4 billion parameter language model developed by IntelLabs. It is based on the microsoft/Phi-3-mini-4k-instruct architecture and features a significant modification: 40% sparsity applied using the Wanda pruning method. This model is specifically designed for efficient deployment and adaptation, as detailed in the research paper "SQFT: Low-cost Model Adaptation in Low-precision Sparse Foundation Models" (arXiv link).

Key Characteristics

  • Source Model: Built upon Microsoft's Phi-3-mini-4k-instruct.
  • Sparsity: Achieves 40% sparsity through the Wanda pruning technique, enhancing efficiency.
  • Parameter Count: 4 billion parameters.
  • Context Length: Supports a 4096-token context window.
  • Quantization: No additional quantization applied beyond its sparse structure.

Intended Use Cases

This model is particularly well-suited for scenarios requiring:

  • Resource-constrained environments: Its sparse nature allows for more efficient inference and deployment.
  • Low-cost model adaptation: Designed for scenarios where adapting foundation models with limited computational resources is crucial.
  • Research into sparse and efficient LLMs: Serves as a practical example for exploring the benefits of sparsity in language models.