diabolic6045/Sanskrit-Qwen2.5-7B-base

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Sep 11, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

The diabolic6045/Sanskrit-Qwen2.5-7B-base is a 7.6 billion parameter language model, fine-tuned from Qwen/Qwen2.5-7B, specifically designed for Sanskrit text generation and understanding. Developed by diabolic6045, it leverages a comprehensive dataset of over 664,000 clean Sanskrit texts and an enhanced tokenizer with Sanskrit-specific punctuation. This model excels at generating coherent Sanskrit verses and prose, completing partial Sanskrit texts, and understanding Sanskrit grammar, making it ideal for specialized Sanskrit language modeling tasks.

Loading preview...

Sanskrit Qwen2.5-7B Base Model Overview

This model, developed by diabolic6045, is a specialized 7.6 billion parameter language model built upon the Qwen2.5-7B architecture. It has been fine-tuned using LoRA on a substantial dataset of over 664,104 clean Sanskrit texts from classical literature, including works like Bhagavata Purana, Mahabharata, and Ramayana. The model's tokenizer has been enhanced with Sanskrit-specific punctuation marks (। ॥) to improve text generation quality in Devanagari script.

Key Capabilities

  • Sanskrit Text Generation: Generates coherent Sanskrit verses and prose.
  • Text Completion: Completes partial Sanskrit sentences and verses.
  • Language Understanding: Demonstrates understanding of Sanskrit grammar and structure.
  • Punctuation Handling: Incorporates proper Sanskrit punctuation.

Training Details

The model was trained for 3 epochs with a sequence length of 1024 tokens, achieving a final training loss of 0.15. LoRA adapters with a rank of 32 were applied to all linear layers, resulting in approximately 16.8 million trainable parameters. The training utilized 2x RTX 4090 GPUs for about 8 hours.

Recommended Use Cases

  • Generating new Sanskrit verses and prose.
  • Completing partial Sanskrit texts.
  • Developing educational tools for Sanskrit learning.
  • Research in Sanskrit language patterns and literature generation.

Limitations

  • Limited context length of 1024 tokens during training.
  • Optimized for classical Sanskrit; may not perform as well with modern Sanskrit.
  • Not designed for translation or transliteration tasks; a separate chat model is recommended for these.
  • Text-only model, lacking multimodal capabilities.