sjbaek/gemma2-2b-it-korean-dialect

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2.6BQuant:BF16Ctx Length:8kPublished:Sep 22, 2024License:mitArchitecture:Transformer0.0K Open Weights Warm

sjbaek/gemma2-2b-it-korean-dialect is a 2.6 billion parameter Gemma2-based instruction-tuned model developed by sjbaek, specifically fine-tuned using QLoRa for Korean dialect translation. It specializes in converting Jeju dialect to standard Korean and vice-versa, leveraging a small LLM for cost-effective performance. This model is designed for educators, linguists, and developers working on Korean dialect recognition and translation tools.

Loading preview...

Model Overview

sjbaek/gemma2-2b-it-korean-dialect is a 2.6 billion parameter model built upon the Gemma2-2b-it architecture, fine-tuned by sjbaek using QLoRa. Its primary function is to translate between Korean dialects and standard Korean, specifically focusing on the Jeju dialect in its current version. The model aims to provide effective dialect conversion capabilities using a smaller LLM, offering a cost-efficient solution.

Key Capabilities

  • Bidirectional Translation: Converts Jeju dialect to standard Korean and standard Korean to Jeju dialect.
  • Specialized Fine-tuning: Utilizes QLoRa for efficient fine-tuning on specific dialect datasets.
  • Small LLM Advantage: Achieves dialect conversion performance with a smaller model size, beneficial for resource-constrained applications.

Training Data

The model was trained using the AI_HUB Middle-Aged and Elderly Korean Dialect Data, which includes data for Chungcheong, Jeolla, and Jeju dialects.

Limitations and Future Plans

Currently, the model's performance is optimized for the Jeju dialect. Future versions are planned to expand support for other Korean dialects, including Chungcheong (v0.3.0), Jeolla (v0.4.0), Gyeongsang (v0.5.0), and Gangwon (v1.0.0).

Use Cases

This model is suitable for developers, educators, and linguists creating tools for Korean dialect understanding, speech recognition, and translation, particularly for the Jeju region.