CorticalStack/mistral-7b-tak-stack-dpo
CorticalStack/mistral-7b-tak-stack-dpo is a 7 billion parameter language model, fine-tuned from Mistral-7B-v0.1 using Direct Preference Optimization (DPO) on the CorticalStack/tak-stack-dpo dataset. This model leverages a context length of 8192 tokens and is specifically optimized for tasks benefiting from DPO-based alignment. Its fine-tuning process aims to enhance response quality and alignment with human preferences.
Loading preview...
Model Overview
CorticalStack/mistral-7b-tak-stack-dpo is a 7 billion parameter language model derived from the Mistral-7B-v0.1 architecture. It has been fine-tuned using Direct Preference Optimization (DPO), a method designed to align language models with human preferences without requiring a separate reward model. The fine-tuning utilized the specific dataset CorticalStack/tak-stack-dpo.
Key Capabilities
- DPO-aligned responses: Optimized to generate outputs that are preferred by humans, based on the DPO training methodology.
- Mistral-7B foundation: Inherits the strong base capabilities of the original Mistral-7B-v0.1 model.
- Efficient fine-tuning: Utilizes LoRA (Low-Rank Adaptation) with specific parameters (r=32, alpha=32, dropout=0.05) for efficient adaptation.
Training Details
The model was trained with a batch size of 4, gradient accumulation steps of 4, and a paged_adamw_32bit optimizer. It underwent 100 training steps with a learning rate of 5e-05 and a cosine learning rate scheduler. The maximum prompt length was set to 1024 tokens, and the maximum sequence length to 1536 tokens, with a beta value of 0.1 for DPO.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.