UNIVA-Bllossom/DeepSeek-llama3.3-Bllossom-70B

TEXT GENERATIONConcurrency Cost:4Model Size:70BQuant:FP8Ctx Length:32kPublished:Feb 12, 2025License:mitArchitecture:Transformer0.1K Open Weights Cold

The UNIVA-Bllossom/DeepSeek-llama3.3-Bllossom-70B is a 70 billion parameter language model developed by UNIVA and Bllossom, based on DeepSeek-R1-distill-Llama-70B. It is specifically post-trained to enhance reasoning performance in Korean environments, addressing limitations of its base model which was primarily trained on English and Chinese data. This model improves Korean inference by performing internal reasoning in English and generating responses in the input language, making it suitable for complex reasoning tasks in Korean.

Loading preview...

Overview

UNIVA-Bllossom/DeepSeek-llama3.3-Bllossom-70B is a 70 billion parameter model developed jointly by UNIVA and Bllossom. It is built upon the DeepSeek-R1-distill-Llama-70B base model, with a primary focus on improving inference performance in Korean language environments. The base model previously exhibited significant performance degradation when inferring in Korean, a limitation this model aims to overcome.

Key Capabilities

  • Enhanced Korean Reasoning: The model is specifically post-trained to improve reasoning capabilities in Korean. It processes internal thoughts in English and generates final responses in the input language, significantly boosting Korean inference accuracy.
  • Diverse Training Data: Training involved Korean and English reasoning datasets, expanding beyond the STEM-focused data typically used for DeepSeek-R1 models to include a wider range of domains.
  • Distillation for Performance: Utilizes a post-training process with proprietary reasoning data to effectively distill the superior reasoning and Korean processing abilities of larger models into the DeepSeek-R1-distill-Llama-70B base.

Benchmarks

Comparative benchmarks show that DeepSeek-llama3.3-Bllossom-70B achieves improved scores in Korean-specific reasoning tasks (AIME24_ko: 62.22, MATH500_ko: 88.40) compared to its DeepSeek-R1-Distill-Llama-70B base model, while maintaining strong English performance.

Licensing

This model and its code repository are licensed under the MIT License, supporting commercial use and allowing modifications and derivative works. It is derived from DeepSeek-R1-Distill-Llama-70B, which is originally licensed under the Llama3.3 license.