yasserrmd/kallamni-4b-v1

Warm
Public
4B
BF16
40960
License: cc-by-nc-4.0
Hugging Face
Overview

Kallamni-4B-v1: Authentic Emirati Arabic Conversational Model

Kallamni-4B-v1 is a 4 billion parameter language model developed by yasserrmd, meticulously fine-tuned to specialize in natural spoken Emirati Arabic. Unlike general Arabic models, Kallamni-4B focuses on capturing the nuances of daily UAE dialect, including its specific vocabulary, phrasing, and cultural references, rather than Modern Standard Arabic (MSA).

Key Capabilities & Features

  • Authentic Emirati Dialect Generation: Designed to produce text that sounds like genuine daily UAE conversation, incorporating words like “وايد”, “هيه”, “سرت”, “عقب”, “الربع”, “القعدة”, “نغير جو”.
  • Conversational Fluidity: Builds upon previous versions (1.2B, 2.6B) to enhance dialect fidelity, consistency, and conversational flow.
  • Specialized Training Data: Fine-tuned on 58,000 synthetic Emirati conversation samples, manually filtered for dialect accuracy.
  • Tokenizer Extension: Includes Emirati-specific tokens to preserve dialect word merges, crucial for accurate representation.
  • High Human Evaluation Scores: Consistently rated by human evaluators as >90% authentic Emirati dialect.

Ideal Use Cases

  • Emirati-specific Chatbots: Creating conversational agents that interact naturally in the UAE dialect.
  • Content Generation: Producing dialogues, social media posts, or narratives that resonate with an Emirati audience.
  • Cultural Immersion Applications: Tools for learning or practicing authentic Emirati spoken Arabic.

Limitations & Ethical Use

Licensed under CC-BY-NC-4.0, the model does not collect personal user data. Users are advised to use it responsibly, avoiding the generation of misinformation, impersonation, or harmful content. Outputs published publicly should cite their AI-generated nature.