Ihor/Text2Graph-R1-Qwen2.5-0.5b

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Jan 30, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

Ihor/Text2Graph-R1-Qwen2.5-0.5b is a 0.5 billion parameter language model based on the Qwen2.5 architecture, specifically fine-tuned for text-to-graph information extraction. This model specializes in identifying named entities and extracting relationships between them from unstructured text, outputting the results in a structured JSON format. It was trained using a combination of supervised learning and Group Relative Policy Optimization (GRPO) to enhance its accuracy in graph extraction and ensure well-formed JSON output. Its primary application is in tasks requiring automated knowledge graph construction or structured data extraction from natural language.

Loading preview...

Text2Graph-R1-Qwen2.5-0.5b Overview

This model, developed by Ihor, is a specialized 0.5 billion parameter language model built upon the Qwen2.5-0.5B-Instruct base. Its core function is text-to-graph information extraction, focusing on identifying named entities and their relationships within text and presenting them in a structured JSON format. It is a reproduction of the DeepSeek R1 approach for this specific task.

Key Capabilities

  • Named Entity Recognition (NER): Identifies unique and contextually relevant entities from text.
  • Relation Extraction (RE): Infers meaningful relationships between identified entities.
  • Structured Output: Generates annotated data in a predefined JSON format, including entity types, text, IDs, and head-tail relationships.
  • Reinforcement Learning Enhanced: Utilizes Group Relative Policy Optimization (GRPO) with specific reward functions for JSON format validation, JSON consistency, and F1 score accuracy, leading to improved extraction quality.

Good For

  • Automated construction of knowledge graphs from textual data.
  • Extracting structured information (entities and relations) from documents.
  • Applications requiring precise, machine-readable output for downstream processing.

Training Methodology

The model was trained using a hybrid approach combining supervised learning and reinforcement learning. The GRPO phase involved over 1,000 steps with reward functions designed to optimize for correct JSON formatting, output consistency, and F1 accuracy in entity and relation extraction. The F1 reward showed continuous growth, while JSON-specific rewards quickly saturated due to initial supervised pre-training.