LlamaLens: Specialized Multilingual Content Analysis LLM
LlamaLens is an 8 billion parameter multilingual large language model developed by QCRI, specifically engineered for the in-depth analysis of news and social media content. It focuses on 18 distinct Natural Language Processing (NLP) tasks, utilizing 52 diverse datasets spanning Arabic, English, and Hindi.
Key Capabilities
- Multilingual Analysis: Proficient in analyzing content in Arabic, English, and Hindi.
- Broad NLP Task Coverage: Addresses 18 NLP tasks, including:
- Attentionworthiness and Checkworthiness Detection
- Claim, Cyberbullying, Emotion, and Factuality Detection
- Harmfulness, Hate Speech, and Offensive Language Detection
- News Categorization and Summarization
- Propaganda, Sarcasm, Sentiment, Stance, and Subjectivity Detection
- Performance: Demonstrates strong performance across various tasks, often surpassing or closely matching SOTA benchmarks and outperforming the Llama-Instruct 3.1 baseline, particularly in tasks like News Categorization in Arabic and English, and Hate Speech Detection in Hindi.
Good for
- Social Media Monitoring: Ideal for platforms requiring nuanced understanding of user-generated content.
- News Analysis: Suitable for applications involving fact-checking, sentiment analysis, and categorization of news articles.
- Multilingual NLP Research: Provides a specialized tool for researchers working on content analysis in Arabic, English, and Hindi.
For a comprehensive understanding of the model's development and performance, refer to the LlamaLens paper.