speechless-mistral-7b-dare-0.85 Overview
The speechless-mistral-7b-dare-0.85 model is a 7 billion parameter language model built upon the Mistral architecture. Developed by speechlessai, this model incorporates the DARE (Drop and REscale) technique, specifically with a weight_mask_rate of 0.85 and use_weight_rescale set to True. It represents a merge of six distinct DARE-optimized models, aiming to improve performance by selectively pruning and rescaling parameters.
Key Capabilities and Performance
This model demonstrates strong performance across a range of benchmarks, as highlighted in its evaluation table. Notably, it achieves:
- 64.69 Average Score: Surpassing several comparable 7B models.
- 45.56 on GSM8K: Indicating a significant strength in mathematical reasoning tasks.
- 64.29 on MMLU: Demonstrating robust general knowledge and understanding.
- 84.82 on HellaSwag: Showing strong common sense reasoning.
DARE Technique
The DARE technique allows for a significant proportion of delta parameters to be set to zero without compromising the capabilities of fine-tuned language models. This approach suggests that larger models can tolerate a higher rate of discarded parameters, leading to potentially more efficient models without substantial performance degradation. This specific model uses a random masking strategy and a scaling coefficient of 1.0.
Good For
- Applications requiring strong mathematical reasoning and general knowledge.
- Scenarios where a 7B parameter model with optimized performance is desired.
- Research into the effectiveness of parameter pruning techniques like DARE.