davidkim205/keval-2-1b

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1BQuant:BF16Ctx Length:32kArchitecture:Transformer0.0K Warm

davidkim205/keval-2-1b is a 1 billion parameter evaluation model built on the Llama-3.2-1B architecture, specifically designed for assessing Korean language models. It utilizes a LLM-as-a-judge approach, fine-tuned with Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) on the custom Ko-bench dataset. This model excels at evaluating Korean linguistic nuances, multi-turn conversation ability, and instruction adherence, offering an alternative to traditional evaluation methods.

Loading preview...

Model Overview

keval-2-1b is a specialized 1 billion parameter model developed by davidkim205 for evaluating Korean language models. It employs an "LLM-as-a-judge" methodology, moving away from traditional evaluation reliance on models like ChatGPT. The model is built upon the Llama-3.2-1B base architecture and has undergone Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO).

Key Capabilities

  • Korean Language Evaluation: Specifically designed to assess Korean linguistic nuances.
  • LLM-as-a-Judge: Utilizes an AI model to evaluate the quality of other Korean LLM responses.
  • Custom Dataset Training: Trained on the Ko-bench dataset, inspired by MT-bench but tailored for Korean, covering diverse user scenarios.
  • Multi-turn Conversation Assessment: Capable of evaluating multi-turn dialogue quality.
  • Instruction Adherence: Assesses how well models follow given instructions.

Use Cases

  • Benchmarking Korean LLMs: Ideal for researchers and developers needing to objectively score Korean language models.
  • Quality Assurance: Can be integrated into development pipelines to ensure high-quality outputs from Korean LLMs.
  • Comparative Analysis: Provides a structured way to compare the performance of different Korean LLMs on specific tasks.