HuggingFaceH4/zephyr-7b-alpha

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:8kPublished:Oct 9, 2023License:mitArchitecture:Transformer1.1K Open Weights Warm

Zephyr-7B-alpha is a 7 billion parameter language model developed by HuggingFaceH4, fine-tuned from Mistral-7B-v0.1. It is optimized to act as a helpful assistant, trained using Direct Preference Optimization (DPO) on a mix of publicly available, synthetic datasets. This model excels in chat-based applications, demonstrating enhanced helpfulness by removing in-built alignment from its training data.

Loading preview...

Zephyr-7B-alpha: A Fine-Tuned Assistant Model

Zephyr-7B-alpha is the inaugural model in the Zephyr series, developed by HuggingFaceH4. It is a 7 billion parameter language model, building upon the robust mistralai/Mistral-7B-v0.1 base model. This model is specifically fine-tuned to function as a helpful assistant.

Key Capabilities & Training

  • Fine-tuning Method: Zephyr-7B-alpha was trained using Direct Preference Optimization (DPO).
  • Dataset Mix: Training involved a combination of publicly available, synthetic datasets, including an initial fine-tuning on a variant of the UltraChat dataset.
  • Alignment: Further alignment was performed with 🤗 TRL's DPOTrainer on the openbmb/UltraFeedback dataset, which contains 64k prompts and GPT-4 ranked model completions.
  • Performance Focus: The model's training intentionally removed some in-built alignment from datasets to boost performance on MT Bench and enhance helpfulness.

Intended Use & Limitations

Zephyr-7B-alpha is primarily intended for chat applications, offering strong performance as a conversational assistant. However, due to the deliberate removal of certain alignment techniques (like RLHF or in-the-loop filtering), the model is more prone to generating problematic outputs if explicitly prompted to do so. Users should be aware of this potential for unaligned responses.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
–
frequency_penalty
presence_penalty
repetition_penalty
min_p