The zstanjj/HTML-Pruner-Llama-1B is a 1 billion parameter Llama-based model developed by zstanjj, designed for HTML pruning within Retrieval Augmented Generation (RAG) systems. It specializes in reducing the length of HTML documents while retaining semantic information, utilizing a two-step block-tree-based pruning approach. This model is particularly effective for optimizing HTML content for long-context LLMs by focusing on relevant sections.
Loading preview...
Overview
The zstanjj/HTML-Pruner-Llama-1B is a 1 billion parameter model specifically developed for HTML pruning, a key component of the HtmlRAG system. This model is designed to process and condense HTML documents, making them more efficient for use with Retrieval Augmented Generation (RAG) systems, especially when dealing with long-context Large Language Models (LLMs).
Key Capabilities
- Lossless HTML Cleaning: Removes irrelevant content and compresses redundant structures in HTML while preserving all semantic information. This prepares HTML for RAG systems with long-context LLMs.
- Two-Step Block-Tree-Based HTML Pruning: Employs a two-stage pruning process based on a block tree structure. The first step uses an embedding model (e.g., BAAI/bge-large-en) to rank HTML blocks, and the second step refines this with a generative model (this Llama-1B model).
- Context Optimization: Reduces the token count of HTML documents, allowing more relevant information to fit within the context window of LLMs.
Performance
When integrated into the HtmlRAG system with Llama-3.1-70B-Instruct as the chat model, HTML-Pruner-Llama-1B demonstrates competitive performance across various question-answering datasets, often outperforming traditional methods like BM25 and BGE on metrics such as Exact Match (EM) and ROUGE-L. For instance, it achieved 60.75 EM on NQ and 45.00 EM on HotpotQA, highlighting its effectiveness in improving RAG system accuracy by providing more focused HTML context.
Good For
- Developers building RAG systems that utilize HTML as a knowledge source.
- Applications requiring efficient processing of lengthy and complex HTML documents.
- Optimizing input context for LLMs to improve retrieval and generation quality.