Name: Tamil-ai/tamil-qwen25-7b-instruct API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Tamil-ai

Model Overview

Tamil-Qwen2.5-7B-Instruct is a specialized large language model developed by Tamil-ai, fine-tuned from the Qwen2.5-7B-Instruct base model. With 7.6 billion parameters, it focuses on enhancing performance for the Tamil language, particularly in instruction-following and linguistic tasks. The model was trained using QLoRA on a comprehensive dataset of 150,000 deduplicated Tamil instruction-response pairs, including data from Tamil Alpaca, Tamil Orca, Tamil Dolly, and specialized morphological drills.

Key Differentiators

Optimized Tamil Tokenization: Built on Qwen2.5, which demonstrates a 4.62x Tamil token ratio, making it significantly more efficient for processing Tamil compared to other base models like Llama 3.1 (5.8x) or Mistral (7.2x).
Specialized Training Data: Fine-tuned on a diverse set of Tamil instruction pairs, including specific morphological and grammar QA data, to improve linguistic understanding.

Intended Use Cases

Tamil Question Answering: Excels at understanding and responding to queries in Tamil.
Morphological Analysis: Designed for tasks involving the analysis of Tamil word structures.
Grammar and Linguistics: Suitable for research and applications requiring deep understanding of Tamil grammar.
Low-Resource Language Research: A valuable tool for exploring and developing LLMs for languages with limited digital resources.

Limitations

Performance may be reduced for colloquial or slang Tamil due to its primary training on instructional content.
English language capabilities might be degraded compared to the original Qwen2.5-7B-Instruct base model.

Overview

Model Overview

Key Differentiators

Intended Use Cases

Limitations

Full Model Card (README)