Qwen-Urdu-Shaheen-7B-Instruct: Bridging Urdu Heritage and AI
Khurram123/Qwen-Urdu-Shaheen-7B-Instruct-v1 is a specialized 7 billion parameter Urdu Language Model built upon the Qwen 2.5 7B Instruct architecture. Fine-tuned by Khurram123 using Unsloth for efficient training, this model stands out for its deep cultural and linguistic understanding of Urdu.
Key Capabilities
- Extensive Urdu Knowledge: Fine-tuned on a massive 1.83 million row corpus, encompassing classical literature, contemporary news, and instructional data.
- Literary Specialization: Proficient in the philosophy of Allama Iqbal and the poetry of Ghalib and Ahmed Faraz, offering nuanced analysis and generation.
- Instruction Following: Optimized with the Alif-Instruct dataset for precise command execution in Urdu.
- Modern Context Integration: Incorporates the Lughat News Corpus for contemporary vocabulary and news synthesis.
- OCR Synergized: Capable of processing and generating poetic couplets derived from Urdu Poetry OCR datasets, including Nastaliq script.
Good For
- Applications requiring deep understanding and generation of Urdu literary content.
- Conversational AI systems needing precise Urdu instruction following.
- Research and development in Urdu natural language processing, especially for historical and poetic texts.
- Tasks involving the synthesis of modern Urdu prose and news-related content.