windlx/url-classifier-model

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Feb 24, 2026License:mitArchitecture:Transformer0.0K Open Weights Cold

The windlx/url-classifier-model is a 1.5 billion parameter URL type classification model, fine-tuned by windlx using LoRA on the Qwen2.5-1.5B base model. It is specifically designed to classify URLs as either list pages or detail pages, achieving 99% accuracy on its validation set. This model is optimized for applications requiring automated URL categorization based solely on the URL string, such as SEO analysis, web crawling, and website analytics.

Loading preview...

URL Page Type Classifier

This model, developed by windlx, is a specialized URL classifier built upon the Qwen2.5-1.5B base model. It has been fine-tuned using the LoRA method (r=16, alpha=32) to distinguish between list pages and detail pages from a given URL string. With only 1.18% of its 1.5 billion parameters being trainable, it offers an efficient solution for URL categorization.

Key Capabilities

  • Binary URL Classification: Accurately identifies whether a URL points to a list page or a detail page.
  • High Accuracy: Achieves 99% accuracy on its validation set, demonstrating robust performance for its specific task.
  • Efficient Fine-tuning: Utilizes LoRA with a small percentage of trainable parameters, making it resource-friendly.
  • Dedicated Dataset: Trained on 10,000 balanced URL samples (5,000 list pages, 5,000 detail pages) from the IowaCat/page_type_inference_dataset.

Good For

  • Search Engine Optimization (SEO): Understanding website structure and page types for better indexing strategies.
  • Web Crawling: Optimizing crawler behavior by prioritizing or categorizing links based on page type.
  • Website Analytics: Gaining insights into the distribution and traffic patterns of different page types.
  • Large-scale URL Processing: Automating the classification of numerous URLs for various applications.

Limitations

  • URL String Dependent: Classification relies solely on the URL string and does not access actual page content.
  • Path Norms: Performance may vary for websites with unconventional URL path structures.
  • Language Support: Primarily optimized for Chinese and English URLs.