biodatlab/ec-raft

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Jun 7, 2025License:llama3.1Architecture:Transformer Cold

EC-RAFT is an 8 billion parameter Retrieval-Augmented Fine-Tuning (RAFT) model, based on LLaMA-3.1-8B-Instruct, developed by biodatlab. It is specifically fine-tuned to automatically generate structured, high-quality clinical trial eligibility criteria (EC) from trial titles and descriptions. The model integrates domain-specific retrieval with synthesized intermediate reasoning steps, enabling it to produce clinically relevant and contextually appropriate EC sets. It outperforms zero-shot LLaMA-3.1 and Gemini-1.5-flash baselines in this specialized task.

Loading preview...

EC-RAFT: Automated Clinical Trial Eligibility Criteria Generation

EC-RAFT is an 8 billion parameter model developed by biodatlab, fine-tuned from LLaMA-3.1-8B-Instruct using a Retrieval-Augmented Fine-Tuning (RAFT) approach. Its core function is to automatically generate structured, high-quality clinical trial eligibility criteria (EC) based on trial titles and descriptions.

Key Capabilities & Features

  • Specialized EC Generation: Designed specifically for creating clinically relevant and contextually appropriate eligibility criteria.
  • Retrieval-Augmented: Integrates domain-specific retrieval and synthesized intermediate reasoning steps, generated using Gemini-1.5-flash-002, for enhanced accuracy.
  • Performance: Outperforms zero-shot LLaMA-3.1 and Gemini-1.5-flash baselines, as well as fine-tuned LLaMA and Meditron models, on EC generation tasks.
  • Clinically Validated: Achieves high correlation with human physician evaluations, with a Mean LLM-as-a-Judge Score of 1.7150 (0–3 scale) and a BERTScore of 86.23.
  • Fine-tuning: Utilizes Low-Rank Adaptation (LoRA) and is trained on a comprehensive dataset from ClinicalTrials.gov (267,347 trials).

Intended Use Cases

  • Researchers, Trial Designers, and Sponsors: Assists in drafting clinical trial eligibility criteria, reducing manual effort and improving consistency.
  • Automation: Automates EC generation for integration into trial registry platforms, clinical trial matching systems, and EC recommendation tools.

Limitations

  • Requires human validation before clinical use.
  • Primarily trained on public ClinicalTrials.gov data, which may limit generalization to rare diseases, specialized trial designs, or non-public data.
  • Optimized for English-language clinical trials.
  • Subject to typical LLM risks such as hallucination and subtle errors.