QCRI/MemeLens-VLM
MemeLens-VLM by QCRI is an 8 billion parameter unified multilingual, multitask Vision-Language Model (VLM) fine-tuned from Qwen3-VL-8B-Instruct. It specializes in meme understanding across 20 tasks and 9 languages, utilizing a classify-then-explain training strategy on a consolidated dataset of 38 public meme datasets. This model excels at identifying harm, targets, figurative/pragmatic intent, and affect within memes, outperforming other zero-shot VLMs on these specific tasks.
Loading preview...
MemeLens-VLM: Multilingual Multitask Meme Understanding
MemeLens-VLM, developed by QCRI, is an 8 billion parameter Vision-Language Model (VLM) specifically designed for comprehensive meme understanding. Fine-tuned from Qwen3-VL-8B-Instruct, it leverages a unique classify-then-explain training strategy on the extensive MemeLens dataset, which integrates 38 public meme datasets.
Key Capabilities
- Multilingual Support: Processes memes in 9 languages including Arabic, Bengali, German, English, Spanish, Hindi, Romanian, Russian, and Chinese.
- Multitask Proficiency: Addresses 20 distinct meme understanding tasks across categories such as harm detection (hateful, toxic, abusive), target identification (misogyny, stereotyping), figurative/pragmatic intent (propaganda, sarcasm, deepfake), and affect (humor, offensive, motivational).
- Superior Performance: Achieves an overall accuracy of 74.1% and a Macro-F1 of 0.625, significantly outperforming zero-shot applications of models like GPT-4.1, Qwen3-VL-8B-Instruct, and InternVL3.5-8B on meme understanding benchmarks.
Good for
- Social Media Analysis: Ideal for platforms and researchers needing to automatically detect and analyze complex content within memes, such as hate speech, propaganda, or specific emotional tones.
- Content Moderation: Provides robust capabilities for identifying harmful or inappropriate content in visual and textual meme formats across diverse linguistic contexts.
- Cross-Cultural Meme Research: Facilitates studies on how memes convey meaning, intent, and sentiment across different languages and cultural nuances.