Model Overview
HiTZ/Llama-3.1-8B-Instruct-multi-truth-judge is an 8 billion parameter LLM-as-a-Judge, fine-tuned from meta-llama/Meta-Llama-3.1-8B-Instruct by the HiTZ Center and collaborators. Its primary function is to evaluate the truthfulness of text generated by other language models, extending beyond English to include Basque, Catalan, Galician, and Spanish.
Key Capabilities
- Multilingual Truthfulness Assessment: Judges model outputs for truthfulness across five languages (English, Basque, Catalan, Galician, Spanish).
- LLM-as-a-Judge Framework: Operates by taking a question, reference answer, and model-generated answer to produce a truthfulness judgment.
- Research-Backed: Developed based on the research presented in the paper "Truth Knows No Language: Evaluating Truthfulness Beyond English" (arXiv:2502.09387).
Good for
- Evaluating LLMs: Directly usable for assessing the truthfulness of language model generations, especially in multilingual contexts.
- Automated Fact-Checking Research: Can serve as a component in systems for automated fact-checking or content moderation research.
- Benchmarking: Ideal for evaluating models against the TruthfulQA benchmark, particularly its multilingual extensions.
Limitations
Users should be aware that the model's performance can vary across languages and question types (universal vs. culturally specific). It is not designed for general text generation or providing factual information directly, and its judgments should be cross-verified for critical applications.