ArxivLlama: Node Classification on Text-Attributed Graphs
ArxivLlama is an 8 billion parameter Llama model developed by xinyifang, specifically fine-tuned for node classification on text-attributed graphs (TAGs). It is based on the unsloth/meta-llama-3.1-8b-instruct-bnb-4bit model and was trained using Unsloth and Hugging Face's TRL library, enabling faster training.
Key Capabilities
- Specialized Node Classification: Designed to classify nodes within text-attributed graphs, particularly demonstrated for categorizing arXiv computer science papers.
- Multi-Profiling Data Augmentation: Utilizes a novel multi-profiling framework to augment data, increasing the diversity and quantity of training samples for improved performance.
- Efficient Graph Processing: Addresses challenges of input window size limitations and computational overhead for large-scale graphs by constructing concise, informative fine-tuning prompts and leveraging neighboring node information.
- High Accuracy: Achieves 74.31% accuracy on the ogbn-arxiv dataset and 85.15% accuracy on the ogbn-products dataset, outperforming 11 state-of-the-art baselines.
Good For
- Academic Research Classification: Ideal for tasks involving the classification of scientific articles or similar text-attributed data into predefined categories.
- Graph Representation Learning: Applicable to research and applications requiring LLMs to process and understand text-attributed graphs efficiently.
- Limited Computational Environments: Optimized to operate effectively within limited computational resources while handling complex graph structures.
This model's development is detailed in the paper "LLM Profiling and Fine-Tuning with Limited Neighbor Information for Node Classification on Text-Attributed Graphs" presented at the 2025 IEEE International Conference on Big Data (BigData).