CorticalStack/mistral-7b-tak-stack-dpo

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:8kPublished:Feb 28, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

CorticalStack/mistral-7b-tak-stack-dpo is a 7 billion parameter language model, fine-tuned from Mistral-7B-v0.1 using Direct Preference Optimization (DPO) on the CorticalStack/tak-stack-dpo dataset. This model leverages a context length of 8192 tokens and is specifically optimized for tasks benefiting from DPO-based alignment. Its fine-tuning process aims to enhance response quality and alignment with human preferences.

Loading preview...

Model Overview

CorticalStack/mistral-7b-tak-stack-dpo is a 7 billion parameter language model derived from the Mistral-7B-v0.1 architecture. It has been fine-tuned using Direct Preference Optimization (DPO), a method designed to align language models with human preferences without requiring a separate reward model. The fine-tuning utilized the specific dataset CorticalStack/tak-stack-dpo.

Key Capabilities

  • DPO-aligned responses: Optimized to generate outputs that are preferred by humans, based on the DPO training methodology.
  • Mistral-7B foundation: Inherits the strong base capabilities of the original Mistral-7B-v0.1 model.
  • Efficient fine-tuning: Utilizes LoRA (Low-Rank Adaptation) with specific parameters (r=32, alpha=32, dropout=0.05) for efficient adaptation.

Training Details

The model was trained with a batch size of 4, gradient accumulation steps of 4, and a paged_adamw_32bit optimizer. It underwent 100 training steps with a learning rate of 5e-05 and a cosine learning rate scheduler. The maximum prompt length was set to 1024 tokens, and the maximum sequence length to 1536 tokens, with a beta value of 0.1 for DPO.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p