crumb/apricot-wildflower-20
The crumb/apricot-wildflower-20 is a 7 billion parameter causal language model, fine-tuned from the Mistral-7B architecture. This model was trained for 1,000 steps using a combined language model loss and distillation loss on a filtered Openwebtext2 dataset, leveraging training logits from Mixtral. While its overall performance metrics are slightly below the base Mistral-7B model, it offers a compact alternative for general text generation tasks.
Loading preview...
apricot-wildflower-20: A Mistral-7B Fine-tune
apricot-wildflower-20 is a 7 billion parameter language model derived from the Mistral-7B architecture. It underwent 1,000 steps of fine-tuning, incorporating both a standard language model loss and a distillation loss. The training utilized a filtered subset of the Openwebtext2 dataset, specifically selecting entries with a Reddit score of 20 or higher, and employed logits from Mixtral for distillation.
Key Characteristics
- Base Model: Fine-tuned from Mistral-7B-v0.1.
- Training Data: Openwebtext2 with a Reddit score filter (>=20).
- Training Method: Combined LM loss and distillation loss using Mixtral logits.
- Parameter Count: 7 billion parameters.
- Context Length: Supports an 8192 token context window.
Performance Overview
Evaluations on the Open LLM Leaderboard indicate that apricot-wildflower-20 generally performs slightly below its base model, Mistral-7B-v0.1. For instance, its average score is 59.74 compared to Mistral-7B's 60.97. Specific benchmark scores include:
- MMLU (5-Shot): 63.38
- HellaSwag (10-Shot): 81.76
- TruthfulQA (0-shot): 41.76
- GSM8k (5-shot): 33.97
Potential Use Cases
This model can be considered for general text generation tasks where a 7B parameter model is suitable, especially in scenarios where the specific fine-tuning characteristics might align with the desired output style or domain, or as a base for further domain-specific fine-tuning.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.