UNIVA-Bllossom/DeepSeek-llama3.1-Bllossom-8B

Warm
Public
8B
FP8
32768
License: mit
Hugging Face
Overview

DeepSeek-llama3.1-Bllossom-8B: Enhanced Korean Reasoning

DeepSeek-llama3.1-Bllossom-8B is an 8 billion parameter model developed by UNIVA and Bllossom, building on the DeepSeek-R1-Distill-Llama-8B architecture. This model was specifically designed to overcome the limitations of its base model, which showed reduced performance when inferring in Korean due to its primary training on English and Chinese data.

Key Capabilities & Features

  • Improved Korean Reasoning: Addresses the performance degradation of the base model in Korean environments.
  • Internal English Reasoning: Utilizes a unique approach where internal thought processes are conducted in English, with final responses generated in the input language (e.g., Korean).
  • Post-training with Reasoning Data: Optimized through post-training using proprietary Korean and English reasoning datasets, including diverse domains beyond the STEM focus of the original DeepSeek-R1 models.
  • Distillation Method: Employs distillation techniques to transfer the reasoning capabilities of larger models to the DeepSeek-R1-Distill-Llama-8B base.

Performance & Benchmarks

Benchmarks show significant improvements in Korean reasoning tasks compared to its base model:

  • AIME24_ko: Improved from 25.56 to 36.67.
  • MATH500_ko: Improved from 63.40 to 78.07.

Good for

  • Applications requiring robust Korean language inference and reasoning.
  • Use cases where multilingual capabilities are important, particularly for English-to-Korean reasoning.
  • Developers seeking a model with enhanced problem-solving abilities in a Korean context.