Name: speechlessai/speechless-mistral-7b-dare-0.85 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: speechlessai

speechless-mistral-7b-dare-0.85 Overview

The speechless-mistral-7b-dare-0.85 model is a 7 billion parameter language model built upon the Mistral architecture. Developed by speechlessai, this model incorporates the DARE (Drop and REscale) technique, specifically with a weight_mask_rate of 0.85 and use_weight_rescale set to True. It represents a merge of six distinct DARE-optimized models, aiming to improve performance by selectively pruning and rescaling parameters.

Key Capabilities and Performance

This model demonstrates strong performance across a range of benchmarks, as highlighted in its evaluation table. Notably, it achieves:

64.69 Average Score: Surpassing several comparable 7B models.
45.56 on GSM8K: Indicating a significant strength in mathematical reasoning tasks.
64.29 on MMLU: Demonstrating robust general knowledge and understanding.
84.82 on HellaSwag: Showing strong common sense reasoning.

DARE Technique

The DARE technique allows for a significant proportion of delta parameters to be set to zero without compromising the capabilities of fine-tuned language models. This approach suggests that larger models can tolerate a higher rate of discarded parameters, leading to potentially more efficient models without substantial performance degradation. This specific model uses a random masking strategy and a scaling coefficient of 1.0.

Good For

Applications requiring strong mathematical reasoning and general knowledge.
Scenarios where a 7B parameter model with optimized performance is desired.
Research into the effectiveness of parameter pruning techniques like DARE.