Mihaiii/Cluj-Napoca-0.2

TEXT GENERATIONConcurrency Cost:2Model Size:34BQuant:FP8Ctx Length:32kPublished:Feb 22, 2024License:yi-licenseArchitecture:Transformer Cold

Mihaiii/Cluj-Napoca-0.2 is a 34 billion parameter experimental language model developed by Mihaiii, derived from Mihaiii/Pallas-0.5. This model is created through a layer pruning technique using laserRMT and mergekit, specifically eliminating layers with infinite signal-to-noise ratio in their self-attention value projections. It serves as a demonstration of a method for reducing model complexity while maintaining or improving performance, and is part of a series exploring iterative fine-tuning and pruning strategies.

Loading preview...

Overview

Mihaiii/Cluj-Napoca-0.2 is a 34 billion parameter experimental language model developed by Mihaiii. It is part of the "Cluj-Napoca" series, which focuses on exploring model optimization through layer pruning and iterative fine-tuning. This specific version, 0.2, is derived from the Mihaiii/Pallas-0.5 model.

Key Characteristics

  • Layer Pruning: The model is created by systematically eliminating specific layers from its base model, Mihaiii/Pallas-0.5. This process identifies and removes layers where the signal-to-noise ratio (SNR) of the self_attn.v_proj component is infinite, indicating potentially redundant or low-contribution layers.
  • Methodology: The pruning methodology involves using the laserQlora.ipynb script from cognitivecomputations/laserRMT to calculate SNR for each layer, followed by mergekit to construct the pruned model based on the identified layers.
  • Experimental Series: Cluj-Napoca-0.2 is an early iteration in a series of models. Subsequent versions (0.3-0.5 and 0.7-0.11) are fine-tuned based on their preceding versions, while 0.6 is a further pruned version of 0.5.

Intended Use

This model is primarily intended for researchers and developers interested in:

  • Model Compression Techniques: Studying the effects and efficacy of layer pruning based on signal-to-noise ratio analysis.
  • Experimental LLM Development: Exploring iterative fine-tuning and pruning strategies for large language models.
  • Replication of Pruning Methods: The README provides detailed steps and configurations to replicate the pruning process used to create this model, making it a valuable resource for understanding and applying these techniques.