NCSOFT/Llama-3-OffsetBias-8B

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Jul 11, 2024License:llama3Architecture:Transformer0.0K Cold

NCSOFT/Llama-3-OffsetBias-8B is an 8 billion parameter generative judge model developed by NC Research, built upon Meta Llama-3-8B-Instruct. It is specifically fine-tuned for pairwise preference evaluation tasks, designed to be robust against common evaluation biases. This model excels at identifying the better of two outputs for a given instruction, making it ideal for debiased AI model evaluation.

Loading preview...

Llama-3-OffsetBias-8B: A Debiased Generative Judge Model

NCSOFT/Llama-3-OffsetBias-8B is an 8 billion parameter generative judge model developed by NC Research, fine-tuned from Meta Llama-3-8B-Instruct. Its primary purpose is to perform pairwise preference evaluation, acting as a robust evaluator that mitigates common biases found in other evaluation models. This model was introduced in the paper "OffsetBias: Leveraging Debiased Data for Tuning Evaluators" and is designed to select the superior output between two options for a given instruction.

Key Capabilities & Features

  • Bias Robustness: Specifically trained to be more resilient to various evaluation biases, ensuring fairer and more objective assessments.
  • Pairwise Preference Evaluation: Given an instruction and two potential outputs (a) and (b), the model predicts which output is better.
  • Instruction-Tuned: Fine-tuned on a diverse set of datasets including UltraFeedback, HelpSteer, hh-rlhf, PKU-SafeRLHF, and the proprietary NCSOFT/offsetbias dataset.
  • Specific Prompt Format: Requires a precise prompt template for optimal performance, ensuring consistent and accurate evaluation.

Use Cases

  • Automated Model Evaluation: Ideal for developers and researchers needing an objective and debiased method to compare the quality of responses from different AI models.
  • Quality Assurance: Can be integrated into pipelines to automatically identify preferred outputs based on specific criteria.
  • Research on Evaluation Biases: Useful for studying and mitigating biases in LLM evaluation processes.

Evaluation results from LLMBar and EvalBiasBench demonstrate its effectiveness in various evaluation metrics, particularly its strong performance in mitigating biases related to output length, concreteness, and empty references.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p