Delta-Vector/MS3.2-Austral-24B-KTO Overview
This model is a KTO (Kahneman-Tversky Optimization) checkpoint of the MS3.2 Austral base model, developed by Delta-Vector. It represents a further refinement of the "Winton train" version, with the developer recommending the MS3.2 Winton train for the best overall experience. The KTO fine-tuning process utilized a diverse collection of preference datasets, indicating a focus on aligning the model's outputs with human preferences and instructions.
Key Training Details
The model was trained on 8xA100 GPUs using the Axolotl framework. The training incorporated several specialized datasets, primarily focusing on preference learning and instruction following:
- Delta-Vector/Tauri-IFeval-Dans-Tulu-KTO: Likely for instruction following and evaluation.
- Delta-Vector/Tauri-Opus-accepted-hermes-rejected-shuffled: Preference data from Opus and Hermes.
- Delta-Vector/Tauri-Opus-Accepted-GPT-Rejected-Opus-Writing-Prompts: Further preference data related to writing prompts.
- Delta-Vector/Tauri-Helpsteer3-Edit and Delta-Vector/Tauri-Helpsteer-3-Preference-KTO: Datasets focused on helpfulness and steering.
- NewEden/Purpura-Arkhaios-CC-KTO: Additional preference data.
- Delta-Vector/Tauri-KTO-Instruct-Mix and Delta-Vector/Tauri-Synth-1-KTO-R1-No-Think: Synthetic instruction and preference mixes.
Intended Use
Given its KTO fine-tuning on preference and instruction datasets, this model is primarily suited for chat-based applications where aligned, helpful, and instruction-following responses are crucial. Users seeking a model optimized through preference learning for conversational tasks may find this checkpoint valuable, though the developer suggests the MS3.2 Winton train for a broader optimal experience.