Yale-BIDS-Chen/Llama-3.1-8B-Evidence-Filtering

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Nov 25, 2025License:cc-by-nc-4.0Architecture:Transformer0.0K Open Weights Cold

The Yale-BIDS-Chen/Llama-3.1-8B-Evidence-Filtering model is an 8 billion parameter Llama-3.1-8B-Instruct fine-tune, developed by Yale-BIDS-Chen, specifically designed for evidence relevance classification in medical Retrieval-Augmented Generation (RAG) pipelines. It determines whether a given passage contains supporting evidence for a clinical query, outputting "Yes" or "No". This model aims to improve retrieval quality and build more reliable, interpretable medical RAG systems by filtering irrelevant passages.

Loading preview...

Yale-BIDS-Chen/Llama-3.1-8B-Evidence-Filtering: Medical Evidence Classification

This model is a specialized fine-tune of the Llama-3.1-8B-Instruct architecture, developed by Yale-BIDS-Chen, with a focus on evidence relevance classification within medical Retrieval-Augmented Generation (RAG) systems. Its primary function is to act as a lightweight classifier, assessing whether a candidate passage provides supporting evidence for a given clinical query.

Key Capabilities & Features

  • Medical Evidence Filtering: Classifies passages as "Yes" (contains supporting evidence) or "No" (does not contain supporting evidence) for a clinical query.
  • Improved RAG Quality: Designed to enhance the reliability and interpretability of medical RAG pipelines by filtering out irrelevant information before text generation.
  • Fine-tuned Performance: Achieves an F1 score of 0.623 on expert-annotated medical query-passage pairs, demonstrating substantial gains over zero-shot baselines like Llama-3.1-8B (0.521 F1) and GPT-4o (0.442 F1).
  • Training Data: Trained on 3,200 expert-labeled query-passage pairs, focusing on the specific task of evidence classification.

Intended Use Cases

  • Medical RAG Systems: Ideal for integration into medical question-answering systems to pre-filter retrieved documents.
  • Research Purposes: Intended for researchers working on improving the accuracy and efficiency of information retrieval in clinical contexts.
  • Building Interpretable AI: Contributes to more transparent RAG pipelines by explicitly identifying relevant evidence.

For detailed methodology and experimental results, refer to the associated paper: Rethinking Retrieval-Augmented Generation for Medicine: A Large-Scale, Systematic Expert Evaluation and Practical Insights.