dnotitia/Llama-DNA-1.0-8B-Instruct

Warm
Public
8B
FP8
32768
License: cc-by-nc-4.0
Hugging Face
Overview

Overview

dnotitia/Llama-DNA-1.0-8B-Instruct is an 8 billion parameter bilingual language model developed by Dnotitia Inc. It is built upon the Llama architecture and features an extended context length of 131,072 tokens. This model is specifically optimized for Korean language tasks, demonstrating strong performance in both understanding and generation, while also retaining robust English capabilities.

Key Capabilities & Development

The model's development involved a sophisticated process:

  • Architecture: Based on the Llama family, specifically Llama 3.1 8B Instruct.
  • Bilingual Focus: Optimized for Korean, with strong English support.
  • Training Methodology: Utilizes model merging via spherical linear interpolation (SLERP), knowledge distillation (KD) from Llama 3.1 405B, and extensive continual pre-training (CPT) on a high-quality Korean dataset. It was further refined with supervised fine-tuning (SFT) and direct preference optimization (DPO) for instruction following.
  • Context Length: Supports a substantial 131,072 tokens, enabling processing of longer inputs.

Performance Highlights

DNA 1.0 8B Instruct demonstrates leading performance on several Korean benchmarks:

  • KMMLU: Achieves 53.26 (1st place) and 29.46 on KMMLU-hard (1st place).
  • KoBEST: Scores 83.40 (1st place).
  • Belebele: Reaches 57.99 (1st place).
  • English Benchmarks: Also performs competitively on English tasks, securing 43.05 on MMLU-Pro (1st place) and 80.52 on GSM8K (1st place).

Good for

  • Applications requiring high-quality Korean language understanding and generation.
  • Tasks benefiting from a large context window (128k tokens).
  • Instruction-following tasks in both Korean and English.

Limitations

Users should be aware that the model may occasionally generate biased, inappropriate, or factually incorrect content, and its responses are based on training data, not current information. Performance can vary with task complexity.