Model Overview: uukuguy/neural-chat-7b-v3-1-dare-0.85
This model is an experimental 7 billion parameter language model derived from the Intel/neural-chat-7b-v3-1 base, incorporating the DARE (Drop and REscale) technique. The DARE method investigates the resilience of fine-tuned language models to parameter pruning, specifically by setting a high proportion (0.85) of delta parameters to zero. This approach aims to understand how larger models can tolerate a higher proportion of discarded parameters while maintaining performance.
Key Characteristics
- DARE Method Application: Utilizes a
weight_mask_rate of 0.85, use_weight_rescale set to True, and a random mask strategy with a scaling_coefficient of 1.0. - Performance Retention: The experiment suggests that a substantial portion of delta parameters can be pruned without significantly affecting the model's capabilities.
- Base Model: Built upon the
Intel/neural-chat-7b-v3-1 architecture, known for its general language understanding and generation abilities.
Benchmarks
The model's performance is evaluated across various benchmarks, showing competitive results compared to other 7B models:
- Average Score: Achieves an average score of 59.06, matching the base
Intel/neural-chat-7b-v3-1 model. - Reasoning: Scores 66.21 on ARC and 62.37 on MMLU.
- Common Sense: Achieves 83.64 on HellaSwag and 78.14 on Winogrande.
- Knowledge: Scores 59.65 on TruthfulQA.
- Math/Problem Solving: Achieves 19.56 on GSM8K and 43.84 on DROP.
Good For
- Research into Model Sparsity: Ideal for researchers exploring parameter pruning techniques like DARE and their impact on LLM performance.
- General Language Tasks: Suitable for applications requiring robust language understanding and generation, leveraging the capabilities of its base model.
- Efficiency Studies: Useful for investigating potential pathways to more efficient model deployment through parameter reduction.