v000000/Qwen2.5-14B-Gutenberg-1e-Delta

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:14.8BQuant:FP8Ctx Length:32kPublished:Sep 20, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

Qwen2.5-14B-Gutenberg-1e-Delta is a 14.8 billion parameter language model based on the Qwen2.5-14B-Instruct architecture, fine-tuned by v000000. This model is specifically trained for 1.25 epochs using DPO on the jondurbin/gutenberg-dpo-v0.1 dataset, focusing on improving its performance through direct preference optimization. It features a substantial 131,072 token context length, making it suitable for tasks requiring extensive contextual understanding and generation.

Loading preview...

Qwen2.5-14B-Gutenberg-1e-Delta Overview

This model, developed by v000000, is a 14.8 billion parameter variant of the Qwen2.5-14B-Instruct architecture. It has undergone specific fine-tuning using Direct Preference Optimization (DPO) for 1.25 epochs on the jondurbin/gutenberg-dpo-v0.1 dataset. This targeted training aims to enhance its capabilities through preference-based learning.

Key Characteristics

  • Base Model: Qwen2.5-14B-Instruct
  • Parameter Count: 14.8 billion parameters
  • Context Length: Supports an extensive context window of 131,072 tokens
  • Training Method: Fine-tuned with DPO (Direct Preference Optimization) on a specialized dataset.

Performance Metrics

Evaluations on the Open LLM Leaderboard indicate the following performance:

  • Average Score: 32.11
  • IFEval (0-Shot): 80.45
  • BBH (3-Shot): 48.62
  • MMLU-PRO (5-shot): 43.67

Potential Use Cases

Given its DPO fine-tuning on a Gutenberg-derived dataset and large context window, this model could be particularly effective for:

  • Tasks requiring nuanced understanding and generation based on extensive text.
  • Applications benefiting from preference-aligned outputs.
  • Scenarios where a large context is crucial for maintaining coherence and relevance over long interactions or documents.