The jamesjunyuguo/verbal-calibrate model is an 8 billion parameter instruction-tuned variant of Meta's Llama-3.1-8B-Instruct, specifically fine-tuned for factual Question Answering (QA) with explicit verbalized confidence scores. It is designed to provide step-by-step answers followed by a decimal confidence score (0-1), enabling adaptive Retrieval-Augmented Generation (RAG) pipelines. This model excels at uncertainty quantification and selective retrieval, making it suitable for confidence-aware QA and research into uncertainty calibration.
Loading preview...
verbal-calibrate: Confidence-Aware QA for Adaptive RAG
verbal-calibrate is an 8 billion parameter model fine-tuned from meta-llama/Llama-3.1-8B-Instruct. Its core innovation lies in its ability to express calibrated verbal confidence alongside its answers to factual questions.
Key Capabilities & Features
- Verbalized Confidence: Provides a decimal confidence score (0-1) with each answer, reflecting the model's uncertainty.
- Adaptive Retrieval Gating: Designed for adaptive RAG, where a low confidence score (e.g., < 0.5) can trigger external retrieval (like BM25) for a second-pass generation.
- Step-by-Step Reasoning: Answers factual questions by first reasoning through the problem before stating the final answer and confidence.
- Targeted Training: Supervised fine-tuning on multi-hop QA datasets (HotpotQA, MuSiQue, 2WikiMultiHopQA) and open-domain QA (NQ, TriviaQA), followed by calibration to align expressed confidence with empirical accuracy.
Performance Highlights
Evaluation across various QA datasets shows varying performance and trigger rates, indicating its ability to selectively request retrieval. For instance, on TriviaQA, it achieved an EM of 53.2 and F1 of 62.5 with a 28.8% trigger rate, while on MuSiQue, it had an EM of 11.8 and F1 of 18.8 with a 76.8% trigger rate.
Ideal Use Cases
- Adaptive RAG Pipelines: Dynamically decide when to perform retrieval based on the model's self-assessed confidence.
- Confidence-Aware Factual QA: Applications requiring not just an answer, but also an indication of the answer's reliability.
- Uncertainty Calibration Research: A valuable tool for studying and improving uncertainty quantification in LLMs.