biodatlab/ec-raft
EC-RAFT is an 8 billion parameter Retrieval-Augmented Fine-Tuning (RAFT) model, based on LLaMA-3.1-8B-Instruct, developed by biodatlab. It is specifically fine-tuned to automatically generate structured, high-quality clinical trial eligibility criteria (EC) from trial titles and descriptions. The model integrates domain-specific retrieval with synthesized intermediate reasoning steps, enabling it to produce clinically relevant and contextually appropriate EC sets. It outperforms zero-shot LLaMA-3.1 and Gemini-1.5-flash baselines in this specialized task.
Loading preview...
EC-RAFT: Automated Clinical Trial Eligibility Criteria Generation
EC-RAFT is an 8 billion parameter model developed by biodatlab, fine-tuned from LLaMA-3.1-8B-Instruct using a Retrieval-Augmented Fine-Tuning (RAFT) approach. Its core function is to automatically generate structured, high-quality clinical trial eligibility criteria (EC) based on trial titles and descriptions.
Key Capabilities & Features
- Specialized EC Generation: Designed specifically for creating clinically relevant and contextually appropriate eligibility criteria.
- Retrieval-Augmented: Integrates domain-specific retrieval and synthesized intermediate reasoning steps, generated using Gemini-1.5-flash-002, for enhanced accuracy.
- Performance: Outperforms zero-shot LLaMA-3.1 and Gemini-1.5-flash baselines, as well as fine-tuned LLaMA and Meditron models, on EC generation tasks.
- Clinically Validated: Achieves high correlation with human physician evaluations, with a Mean LLM-as-a-Judge Score of 1.7150 (0–3 scale) and a BERTScore of 86.23.
- Fine-tuning: Utilizes Low-Rank Adaptation (LoRA) and is trained on a comprehensive dataset from ClinicalTrials.gov (267,347 trials).
Intended Use Cases
- Researchers, Trial Designers, and Sponsors: Assists in drafting clinical trial eligibility criteria, reducing manual effort and improving consistency.
- Automation: Automates EC generation for integration into trial registry platforms, clinical trial matching systems, and EC recommendation tools.
Limitations
- Requires human validation before clinical use.
- Primarily trained on public ClinicalTrials.gov data, which may limit generalization to rare diseases, specialized trial designs, or non-public data.
- Optimized for English-language clinical trials.
- Subject to typical LLM risks such as hallucination and subtle errors.