Mihaiii/Cluj-Napoca-0.2
Mihaiii/Cluj-Napoca-0.2 is a 34 billion parameter experimental language model developed by Mihaiii, derived from Mihaiii/Pallas-0.5. This model is created through a layer pruning technique using laserRMT and mergekit, specifically eliminating layers with infinite signal-to-noise ratio in their self-attention value projections. It serves as a demonstration of a method for reducing model complexity while maintaining or improving performance, and is part of a series exploring iterative fine-tuning and pruning strategies.
Loading preview...
Overview
Mihaiii/Cluj-Napoca-0.2 is a 34 billion parameter experimental language model developed by Mihaiii. It is part of the "Cluj-Napoca" series, which focuses on exploring model optimization through layer pruning and iterative fine-tuning. This specific version, 0.2, is derived from the Mihaiii/Pallas-0.5 model.
Key Characteristics
- Layer Pruning: The model is created by systematically eliminating specific layers from its base model, Mihaiii/Pallas-0.5. This process identifies and removes layers where the signal-to-noise ratio (SNR) of the
self_attn.v_projcomponent is infinite, indicating potentially redundant or low-contribution layers. - Methodology: The pruning methodology involves using the
laserQlora.ipynbscript fromcognitivecomputations/laserRMTto calculate SNR for each layer, followed bymergekitto construct the pruned model based on the identified layers. - Experimental Series: Cluj-Napoca-0.2 is an early iteration in a series of models. Subsequent versions (0.3-0.5 and 0.7-0.11) are fine-tuned based on their preceding versions, while 0.6 is a further pruned version of 0.5.
Intended Use
This model is primarily intended for researchers and developers interested in:
- Model Compression Techniques: Studying the effects and efficacy of layer pruning based on signal-to-noise ratio analysis.
- Experimental LLM Development: Exploring iterative fine-tuning and pruning strategies for large language models.
- Replication of Pruning Methods: The README provides detailed steps and configurations to replicate the pruning process used to create this model, making it a valuable resource for understanding and applying these techniques.