KISTI-KONI/KONI-Llama3-8B-Instruct-20240729

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Jul 27, 2024License:llama3Architecture:Transformer0.0K Warm

KISTI-KONI/KONI-Llama3-8B-Instruct-20240729 is a specialized large language model developed by the Korea Institute of Science and Technology Information (KISTI), based on a merged Llama 3 8B architecture. It is explicitly trained on a vast corpus of scientific and technological data, making it highly effective for tasks in these fields. This instruction-tuned model, enhanced through SFT and DPO, demonstrates strong performance across reasoning, math, writing, and coding, achieving an 8.21 overall score on the LogicKor benchmark. It is optimized for science and technology-related applications, particularly excelling among publicly available 8B models on the LogicKor leaderboard.

Loading preview...

KISTI-KONI/KONI-Llama3-8B-Instruct-20240729 Overview

KISTI-KONI/KONI-Llama3-8B-Instruct-20240729 is a specialized large language model developed by the Korea Institute of Science and Technology Information (KISTI). It is built upon a merged base model, KONI-Llama3-8B-Merged-20240724, which combines Meta-Llama-3-8B and KISTI-KONI/KONI-Llama3-8B-20240630. This model is specifically designed and optimized for science and technology domains.

Key Features & Training

  • Domain Specialization: Explicitly trained on a vast and specialized corpus of scientific and technological data, making it highly effective for tasks within these fields.
  • Enhanced Performance: Represents a significant improvement over KISTI's initial KONI release from December 2023.
  • Instruction Tuning: Undergoes Supervised Fine-Tuning (SFT) using approximately 11k data points, including internally generated and publicly available (translated Korean) datasets.
  • Preference Alignment: Utilizes Direct Preference Optimization (DPO) with around 7k curated and translated data points from argilla/dpo-mix-7k.

Benchmark Performance

The model demonstrates strong capabilities across various metrics on the LogicKor leaderboard, where it holds the best performance among publicly available 8B models as of July 30, 2024:

  • Overall Score: 8.21
  • Reasoning: 6.57
  • Math: 8.00
  • Writing: 8.92
  • Coding: 8.85
  • Comprehension: 9.85

Ideal Use Cases

This model is particularly well-suited for applications requiring deep understanding and generation within scientific and technological contexts, especially for Korean language tasks. Its strong performance in coding, writing, and comprehension makes it valuable for technical documentation, scientific research assistance, and specialized content creation.