Xenon1/Zenith-7B-dpo

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:8kPublished:Feb 14, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

Zenith-7B-dpo by Xenon1 is a 7 billion parameter language model, fine-tuned from Mistral-7B-v0.1 using the Ultrafeedback dataset and techniques from the "Self-Rewarding Language Models" paper. It leverages Grouped-Query Attention and Sliding-Window Attention for efficient processing. This model is optimized for instruction-following tasks, providing enhanced conversational capabilities.

Loading preview...

Zenith-7B-dpo: Instruction-Tuned Mistral-7B

Zenith-7B-dpo is a 7 billion parameter language model developed by Xenon1, built upon the robust Mistral-7B-v0.1 architecture. This model has undergone fine-tuning using the Ultrafeedback dataset, incorporating advanced techniques detailed in the paper "Self-Rewarding Language Models" (arXiv:2401.10020).

Key Architectural Features

  • Grouped-Query Attention: Enhances inference speed and reduces memory footprint.
  • Sliding-Window Attention: Optimizes context handling for longer sequences.
  • Byte-fallback BPE tokenizer: Provides robust tokenization across diverse text.

Instruction Format

Zenith-7B-dpo is specifically designed for instruction-following. Prompts should be enclosed within [INST] and [/INST] tokens, with the first instruction preceded by a begin-of-sentence ID. This format is fully supported via Hugging Face's apply_chat_template() method, ensuring seamless integration into conversational AI applications.

Ideal Use Cases

  • Instruction-following chatbots: Excels in generating coherent and contextually relevant responses to user prompts.
  • Conversational AI: Suitable for applications requiring natural language interaction and dialogue generation.
  • Research and experimentation: Provides a strong base model for further fine-tuning on specific instruction-based tasks.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p