burgasdotpro/bgGPT-DeepSeek-R1-Distill-Qwen-7B

Warm
Public
7.6B
FP8
131072
License: apache-2.0
Hugging Face
Overview

bgGPT-DeepSeek-R1-Distill-Qwen-7B Overview

This model, developed by burgasdotpro, is a 7.6 billion parameter language model built upon the DeepSeek-R1-Distill-Qwen-7B architecture. It has undergone continued pretraining with a focus on Bulgarian language data, specifically utilizing Wikipedia content (50% and 100% in different pretraining phases).

Key Capabilities & Performance

  • Bulgarian Language Optimization: The model shows substantial improvements in perplexity (PPL) for Bulgarian text. For short texts, PPL improved from 179.76 (base model) to 72.63, and for long texts, from 258.56 to 83.96.
  • Reasoning and Problem Solving: Demonstrated capability in logical reasoning and step-by-step problem-solving, as shown in the example of solving algebraic equations with detailed thought processes.
  • Efficient Training: The model was trained using Unsloth and Huggingface's TRL library, enabling faster training times (2x faster).

Use Cases

This model is particularly well-suited for applications requiring strong Bulgarian language understanding, generation, and logical reasoning. It can serve as an effective Bulgarian-language automated assistant for tasks involving text comprehension, mathematical problem-solving, and general conversational AI in Bulgarian.