RAGU-lm: Specialized Russian Knowledge Graph Extraction Model
RAGU-lm is a fine-tuned Qwen-3-0.6B model developed by RaguTeam, specifically engineered for knowledge graph construction from Russian text. With 0.8 billion parameters and a 40960 token context length, it focuses on extracting semantic information.
Key Capabilities
- Entity Extraction: Identifies and lists unnormalized named entities from text.
- Entity Normalization: Standardizes extracted entities.
- Entity Description Generation: Creates contextual descriptions for entities based on the source text.
- Relation Extraction: Determines relationships between two entities and generates their descriptions from the text.
Performance Highlights
The model demonstrates strong performance in its specialized tasks, outperforming a larger Qwen-2.5-14B-Instruct model in both entity and relation extraction on an improved NEREL dataset:
- Entity Extraction: Achieves an F1 score of 0.6 (macro avg.), with 0.7 Precision and 0.52 Recall.
- Relation Extraction: Achieves an F1 score of 0.71 (macro avg.), with 0.84 Precision and 0.71 Recall.
Training Details
RAGU-lm was trained on an enhanced version of the NEREL dataset, incorporating true negative examples for relation extraction and generated descriptions from the Claude model. It utilizes four distinct instruction formats, one for each supported task.
Ideal Use Cases
This model is particularly well-suited for applications requiring precise semantic information extraction and knowledge graph population from Russian language texts, such as:
- Automated knowledge base construction
- Information retrieval systems
- Advanced text analytics in Russian