Name: davidkim205/keval-2-1b API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: davidkim205

Model Overview

keval-2-1b is a specialized 1 billion parameter model developed by davidkim205 for evaluating Korean language models. It employs an "LLM-as-a-judge" methodology, moving away from traditional evaluation reliance on models like ChatGPT. The model is built upon the Llama-3.2-1B base architecture and has undergone Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO).

Key Capabilities

Korean Language Evaluation: Specifically designed to assess Korean linguistic nuances.
LLM-as-a-Judge: Utilizes an AI model to evaluate the quality of other Korean LLM responses.
Custom Dataset Training: Trained on the Ko-bench dataset, inspired by MT-bench but tailored for Korean, covering diverse user scenarios.
Multi-turn Conversation Assessment: Capable of evaluating multi-turn dialogue quality.
Instruction Adherence: Assesses how well models follow given instructions.

Use Cases

Benchmarking Korean LLMs: Ideal for researchers and developers needing to objectively score Korean language models.
Quality Assurance: Can be integrated into development pipelines to ensure high-quality outputs from Korean LLMs.
Comparative Analysis: Provides a structured way to compare the performance of different Korean LLMs on specific tasks.

Overview

Model Overview

Key Capabilities

Use Cases

Full Model Card (README)