SanjiWatsuki/Lelantos-DPO-7B

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 12, 2024License:cc-by-nc-4.0Architecture:Transformer0.0K Open Weights Cold

SanjiWatsuki/Lelantos-DPO-7B is a 7 billion parameter language model developed by SanjiWatsuki, fine-tuned using Direct Preference Optimization (DPO). This model demonstrates strong performance across various benchmarks, achieving an average score of 58.54% across AGIEval, GPT4All, TruthfulQA, and Bigbench. It is particularly suited for general-purpose language understanding and generation tasks where robust performance on common reasoning and knowledge-based evaluations is critical.

Loading preview...

Lelantos-DPO-7B: A DPO-Fine-Tuned 7B Language Model

Lelantos-DPO-7B is a 7 billion parameter language model developed by SanjiWatsuki, distinguished by its fine-tuning using Direct Preference Optimization (DPO). This optimization method enhances the model's ability to align with human preferences, leading to improved performance on a range of evaluative benchmarks.

Key Capabilities & Performance

The model demonstrates solid performance across several established benchmarks, with an overall average score of 58.54%.

  • AGIEval: Achieves an average of 45.47%, with notable scores in tasks like agieval_sat_en (76.70%) and agieval_lsat_rc (65.06%).
  • GPT4All: Scores an average of 75.0%, performing well on arc_easy (85.40%), boolq (87.25%), and winogrande (77.27%).
  • TruthfulQA: Records an average of 67.05% on the truthfulqa_mc benchmark, indicating a good capacity for generating truthful and informative responses.
  • Bigbench: Attains an average of 46.64%, showing competence in tasks such as bigbench_sports_understanding (73.23%) and bigbench_snarks (72.38%).

What Makes This Different?

Lelantos-DPO-7B stands out due to its DPO fine-tuning, which is designed to improve response quality and alignment. When compared to its base model, Lelantos-7B, the DPO version shows a slight improvement in overall average score (58.54% vs. 58.04%), particularly in TruthfulQA (67.05% vs. 64.93%), suggesting enhanced truthfulness and preference alignment.

Should I Use This for My Use Case?

This model is a strong candidate for applications requiring reliable general-purpose language understanding and generation. Its balanced performance across diverse benchmarks makes it suitable for tasks such as:

  • Question Answering: Especially where factual accuracy and reasoning are important.
  • Content Generation: For producing coherent and contextually relevant text.
  • Conversational AI: Where aligned and truthful responses are desired.

Consider Lelantos-DPO-7B if your application benefits from a 7B model with demonstrated capabilities in reasoning, knowledge recall, and preference alignment.