uukuguy/zephyr-7b-alpha-dare-0.85: An Experiment in Parameter Efficiency
This model, developed by uukuguy, is a 7 billion parameter variant of the Zephyr-7B-alpha architecture, specifically fine-tuned using the DARE (Drop and REscale) experimental method. The core idea behind DARE is to demonstrate that a significant portion of delta parameters in fine-tuned language models can be discarded (set to zero) without compromising performance.
Key Characteristics
- Parameter Efficiency Experiment: Investigates the DARE method with a
weight_mask_rate of 0.85, meaning 85% of delta parameters are masked. - Rescaling: Utilizes
use_weight_rescale: True and a scaling_coefficient: 1.0 to compensate for the dropped parameters. - Base Model: Built upon the HuggingFaceH4/zephyr-7b-alpha foundation.
- Context Length: Supports an 8192-token context window.
Performance Insights
The model's performance is evaluated across various benchmarks, including ARC, HellaSwag, MMLU, TruthfulQA, Winogrande, GSM8K, and DROP. While not leading in all categories, its average score of 52.4 on the provided benchmark table is comparable to the base Zephyr-7B-alpha model (52.4 average), suggesting that the DARE method at an 85% mask rate can maintain performance levels. This makes it a valuable resource for research into model compression and efficient fine-tuning techniques.