metaresearch/PapersRAG-1.5B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:May 12, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

PapersRAG-1.5B by Meta Research is a retrieval-augmented generation (RAG) system built on the Qwen 2.5 1.5B language model. It specializes in querying recent cs.CL scientific literature from arXiv, providing citation-backed answers by pairing a lightweight LLM with a continuously updated knowledge base. This model prioritizes factual accuracy and avoids hallucination by refusing to answer if no relevant information is found, making it ideal for scientific research assistance.

Loading preview...

PapersRAG-1.5B: A Continuously Updated RAG System for Scientific Literature

PapersRAG-1.5B, developed by Meta Research, is a specialized retrieval-augmented generation (RAG) system designed to assist researchers in exploring and answering questions about recent NLP papers from arXiv. It uniquely combines a lightweight Qwen 2.5 1.5B language model with a dynamic, daily-updated knowledge base of cs.CL paper abstracts.

Key Capabilities & Features

  • Retrieval-Augmented Generation: Utilizes a sophisticated retrieval pipeline involving dense embeddings for initial candidate retrieval and a cross-encoder for re-ranking, ensuring only the most relevant information reaches the language model.
  • Hallucination Prevention: Explicitly designed to prioritize faithful, citation-backed answers. If no relevant paper is found, the model states so rather than fabricating information.
  • Automated Daily Updates: The knowledge base automatically expands by approximately 100 new cs.CL papers from arXiv every day, ensuring access to the latest research without manual upkeep.
  • Citation-Backed Answers: Every generated answer includes the title of the paper it draws from, enhancing trustworthiness and verifiability.
  • Lightweight & Efficient: Built on a 1.5B parameter language model, making it fast and coherent when grounded with good context.

Good For

  • Scientific Research Assistance: Ideal for scientists and students needing to locate information within indexed NLP papers.
  • Comparative Analysis: Answering questions like "What are the latest trends in retrieval-augmented generation?" within the NLP domain.
  • Surfacing Specific Details: Extracting methodological details or findings from indexed paper abstracts.
  • Applications Requiring Factual Grounding: Use cases where avoiding hallucination and providing verifiable sources are critical.

Limitations

  • Scope: Limited to cs.CL papers from arXiv and only indexes abstracts, not full paper texts.
  • Language: English only.
  • General-Purpose Chatbot: Not intended as a general-purpose chatbot; it will only answer based on its indexed knowledge.