KISTI-KONI/KONI-Llama3.1-8B-Instruct-20241024
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Oct 22, 2024License:llama3.1Architecture:Transformer0.0K Warm

KISTI-KONI/KONI-Llama3.1-8B-Instruct-20241024 is an 8 billion parameter instruction-tuned large language model developed by the Korea Institute of Science and Technology Information (KISTI). Built upon a merged base of Meta-Llama-3-8B and KISTI-KONI/KONI-Llama3.1-8B-20240824, it is specifically designed and optimized for tasks within science and technology domains. This model leverages Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) on specialized datasets, making it highly effective for scientific reasoning, mathematical problems, and technical writing with a 32768 token context length.

Loading preview...

KISTI-KONI/KONI-Llama3.1-8B-Instruct-20241024 Overview

KISTI-KONI/KONI-Llama3.1-8B-Instruct-20241024 is an 8 billion parameter instruction-tuned large language model developed by the Korea Institute of Science and Technology Information (KISTI). This model is a specialized variant of the Llama 3.1 family, specifically engineered for science and technology applications.

Key Capabilities & Features

  • Domain Specialization: Explicitly trained on a vast corpus of scientific and technological data, making it highly proficient in these fields.
  • Enhanced Performance: Demonstrates significantly improved performance compared to its earlier iterations.
  • Base Model: Built upon a merged foundation of Meta-Llama-3-8B and KISTI-KONI/KONI-Llama3.1-8B-20240824.
  • Alignment: Utilizes both Supervised Fine-Tuning (SFT) with approximately 11k data points and Direct Preference Optimization (DPO) with 7k data points for robust instruction following.
  • Multilingual Data: SFT and DPO datasets include internally generated data, publicly available data, and translated/curated data (e.g., from argilla/dpo-mix-7k), with Korean translations where necessary.

Benchmark Performance

Evaluated on the LogicKor benchmark, the model achieved an Overall score of 8.93, with strong performance across various categories:

  • Reasoning: 8.15
  • Math: 8.79
  • Writing: 9.22
  • Coding: 9.21
  • Comprehension: 9.65
  • Grammar: 8.57
  • Single-turn: 9.05
  • Multi-turn: 8.81

Ideal Use Cases

This model is particularly well-suited for developers and researchers working on applications requiring deep understanding and generation within scientific, engineering, and technological contexts, especially those involving Korean language data. Its specialized training makes it a strong candidate for tasks like technical documentation, scientific inquiry, code assistance, and complex problem-solving in these domains.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p