olusegunola/phi-1.5-distill-Ablation_No_L2_Norm-merged
TEXT GENERATIONConcurrency Cost:1Model Size:1.4BQuant:BF16Ctx Length:2kPublished:Mar 22, 2026Architecture:Transformer Cold

The olusegunola/phi-1.5-distill-Ablation_No_L2_Norm-merged is a 1.4 billion parameter language model, likely based on the Microsoft Phi-1.5 architecture, with a context length of 2048 tokens. This specific version is an ablation study variant, indicating modifications related to the removal of L2 normalization during its distillation or training process. Its primary differentiator lies in exploring the impact of L2 norm absence on model performance and characteristics. Further details on its specific capabilities or intended use cases are not provided in the available information.

Loading preview...

Model Overview

This model, olusegunola/phi-1.5-distill-Ablation_No_L2_Norm-merged, is a 1.4 billion parameter language model. It is characterized as an ablation study variant, specifically exploring the effects of removing L2 normalization during its distillation or training. The model has a context length of 2048 tokens.

Key Characteristics

  • Parameter Count: 1.4 billion parameters.
  • Context Length: Supports a context window of 2048 tokens.
  • Ablation Study: Represents a version where L2 normalization has been intentionally excluded, suggesting research into its impact on model behavior and performance.

Intended Use Cases

Given the nature of this model as an ablation study, its primary utility is likely for:

  • Research and Experimentation: Investigating the role and necessity of L2 normalization in language model training and distillation.
  • Comparative Analysis: Serving as a baseline or comparison point against models trained with L2 normalization to understand its contributions.

Further details regarding specific performance metrics, training data, or broader applications are not available in the provided model card.