Xenon1/Zenith-7B-dpo
Zenith-7B-dpo by Xenon1 is a 7 billion parameter language model, fine-tuned from Mistral-7B-v0.1 using the Ultrafeedback dataset and techniques from the "Self-Rewarding Language Models" paper. It leverages Grouped-Query Attention and Sliding-Window Attention for efficient processing. This model is optimized for instruction-following tasks, providing enhanced conversational capabilities.
Loading preview...
Zenith-7B-dpo: Instruction-Tuned Mistral-7B
Zenith-7B-dpo is a 7 billion parameter language model developed by Xenon1, built upon the robust Mistral-7B-v0.1 architecture. This model has undergone fine-tuning using the Ultrafeedback dataset, incorporating advanced techniques detailed in the paper "Self-Rewarding Language Models" (arXiv:2401.10020).
Key Architectural Features
- Grouped-Query Attention: Enhances inference speed and reduces memory footprint.
- Sliding-Window Attention: Optimizes context handling for longer sequences.
- Byte-fallback BPE tokenizer: Provides robust tokenization across diverse text.
Instruction Format
Zenith-7B-dpo is specifically designed for instruction-following. Prompts should be enclosed within [INST] and [/INST] tokens, with the first instruction preceded by a begin-of-sentence ID. This format is fully supported via Hugging Face's apply_chat_template() method, ensuring seamless integration into conversational AI applications.
Ideal Use Cases
- Instruction-following chatbots: Excels in generating coherent and contextually relevant responses to user prompts.
- Conversational AI: Suitable for applications requiring natural language interaction and dialogue generation.
- Research and experimentation: Provides a strong base model for further fine-tuning on specific instruction-based tasks.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.