udkai/Turdus

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 12, 2024License:cc-by-nc-4.0Architecture:Transformer0.0K Open Weights Cold

udkai/Turdus is a 7 billion parameter language model developed by UDK dot AI, Daniel Devatman Hromada. It is a direct preference optimized version of NeuralMarcoro14-7B, fine-tuned for one epoch using a specially modified Winogrande dataset. This model demonstrates a subtle increase in average accuracy across non-Winogrande metrics like ARC, HellaSwag, and TruthfulQA, suggesting an unusual DPO contamination effect. It is primarily notable for its experimental methodology exploring the impact of specific DPO datasets on broader benchmark performance.

Loading preview...

Overview

udkai/Turdus is a 7 billion parameter language model, developed by UDK dot AI, Daniel Devatman Hromada. It is a direct preference optimized (DPO) variant of the mlabonne/NeuralMarcoro14-7B base model. Unlike its predecessor udkai/Garrulus, Turdus was trained for a single epoch using a unique dataset composed primarily of specially modified Winogrande prompts.

Key Characteristics & Findings

This model is a subject of research into "Subtle DPO-Contamination with modified Winogrande." Despite the specific nature of its training data, Turdus shows a slight improvement in average accuracy across several non-Winogrande benchmarks compared to its base model. Specifically, it demonstrates:

  • ARC: Increased from 71.42% to 73.38%
  • HellaSwag: Increased from 87.59% to 88.56%
  • TruthfulQA: Increased from 65.64% to 67.11%

While MMLU and GSM8K scores saw minor decreases, the overall average accuracy across five non-Winogrande metrics (ARC, HellaSwag, MMLU, TruthfulQA, GSM8K) improved by 0.2% (from 72.046% to 72.254%). This suggests that even highly specific and potentially 'contaminated' DPO datasets can have unexpected, subtle impacts on broader model capabilities.

Use Cases

This model is particularly relevant for researchers and developers interested in:

  • Exploring the effects of DPO training on model performance.
  • Investigating dataset contamination and its subtle influences on benchmarks.
  • Understanding how specific fine-tuning strategies can impact diverse reasoning tasks.