Name: CorticalStack/mistral-7b-tak-stack-dpo API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: CorticalStack

Model Overview

CorticalStack/mistral-7b-tak-stack-dpo is a 7 billion parameter language model derived from the Mistral-7B-v0.1 architecture. It has been fine-tuned using Direct Preference Optimization (DPO), a method designed to align language models with human preferences without requiring a separate reward model. The fine-tuning utilized the specific dataset CorticalStack/tak-stack-dpo.

Key Capabilities

DPO-aligned responses: Optimized to generate outputs that are preferred by humans, based on the DPO training methodology.
Mistral-7B foundation: Inherits the strong base capabilities of the original Mistral-7B-v0.1 model.
Efficient fine-tuning: Utilizes LoRA (Low-Rank Adaptation) with specific parameters (r=32, alpha=32, dropout=0.05) for efficient adaptation.

Training Details

The model was trained with a batch size of 4, gradient accumulation steps of 4, and a paged_adamw_32bit optimizer. It underwent 100 training steps with a learning rate of 5e-05 and a cosine learning rate scheduler. The maximum prompt length was set to 1024 tokens, and the maximum sequence length to 1536 tokens, with a beta value of 0.1 for DPO.

Overview

Model Overview

Key Capabilities

Training Details

Full Model Card (README)