THU-KEG/ADELIE-DPO-1.5B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Nov 4, 2024Architecture:Transformer0.0K Warm

THU-KEG/ADELIE-DPO-1.5B is a 1.5 billion parameter language model developed by Yunjia Qi, Hao Peng, Xiaozhi Wang, Bin Xu, Lei Hou, and Juanzi Li, fine-tuned from Qwen2.5-1.5B. It is specifically aligned for Information Extraction (IE) tasks, including closed, open, and on-demand IE, utilizing a direct preference optimization (DPO) objective. The model demonstrates state-of-the-art performance among open-source models on various IE benchmarks while maintaining general language capabilities.

Loading preview...

ADELIE-DPO-1.5B: Aligned for Information Extraction

ADELIE-DPO-1.5B is a 1.5 billion parameter language model developed by Yunjia Qi, Hao Peng, Xiaozhi Wang, Bin Xu, Lei Hou, and Juanzi Li, specifically designed for Information Extraction (IE) tasks. It is fine-tuned from the Qwen2.5-1.5B base model and utilizes a Direct Preference Optimization (DPO) objective after initial instruction tuning on a high-quality IE alignment corpus called IEInstruct.

Key Capabilities

  • Specialized Information Extraction: Excels across various IE tasks, including closed IE, open IE, and on-demand IE.
  • State-of-the-Art Performance: Achieves competitive F1 scores on IE benchmarks, outperforming other open-source models in its size class, such as Llama2 7B and Qwen2.5 1.5B, on IE tasks.
  • General Capability Preservation: Experimental results indicate that its general language capabilities do not significantly decline despite its IE specialization.
  • DPO Alignment: Benefits from Direct Preference Optimization, enhancing its alignment for IE tasks.

Performance Highlights

Compared to Qwen2.5 1.5B, ADELIE-DPO-1.5B shows significant improvements in IE performance:

  • Closed IE: 38.5% F1 (vs. 16.5% for Qwen2.5 1.5B)
  • Open IE: 45.6% F1 (vs. 14.2% for Qwen2.5 1.5B)
  • On-demand IE: 59.2% F1 (vs. 20.5% for Qwen2.5 1.5B)

Good For

  • Developers requiring a compact yet powerful model for diverse information extraction applications.
  • Use cases where precise and efficient extraction of structured or unstructured information is critical.
  • Research and development in natural language understanding focusing on IE tasks.