HuggingFaceH4/zephyr-7b-alpha
Zephyr-7B-alpha is a 7 billion parameter language model developed by HuggingFaceH4, fine-tuned from Mistral-7B-v0.1. It is optimized to act as a helpful assistant, trained using Direct Preference Optimization (DPO) on a mix of publicly available, synthetic datasets. This model excels in chat-based applications, demonstrating enhanced helpfulness by removing in-built alignment from its training data.
Loading preview...
Zephyr-7B-alpha: A Fine-Tuned Assistant Model
Zephyr-7B-alpha is the inaugural model in the Zephyr series, developed by HuggingFaceH4. It is a 7 billion parameter language model, building upon the robust mistralai/Mistral-7B-v0.1 base model. This model is specifically fine-tuned to function as a helpful assistant.
Key Capabilities & Training
- Fine-tuning Method: Zephyr-7B-alpha was trained using Direct Preference Optimization (DPO).
- Dataset Mix: Training involved a combination of publicly available, synthetic datasets, including an initial fine-tuning on a variant of the
UltraChatdataset. - Alignment: Further alignment was performed with 🤗 TRL's
DPOTraineron the openbmb/UltraFeedback dataset, which contains 64k prompts and GPT-4 ranked model completions. - Performance Focus: The model's training intentionally removed some in-built alignment from datasets to boost performance on MT Bench and enhance helpfulness.
Intended Use & Limitations
Zephyr-7B-alpha is primarily intended for chat applications, offering strong performance as a conversational assistant. However, due to the deliberate removal of certain alignment techniques (like RLHF or in-the-loop filtering), the model is more prone to generating problematic outputs if explicitly prompted to do so. Users should be aware of this potential for unaligned responses.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.