LongVideoAgent Qwen3-4B: Specialized for Long-Video QA
LongVideoAgent Qwen3-4B is a 4 billion parameter language model checkpoint, built upon the Qwen3 architecture. It is specifically developed as part of the LongVideoAgent multi-agent framework, which is designed to tackle complex question answering over long video content by decomposing reasoning into specialized roles. This model is intended for use within the official LongVideoAgent codebase and its evaluation pipeline.
Key Capabilities & Performance
- Long-Video Question Answering: Optimized for reasoning and extracting information from extensive video data.
- Multi-Agent Framework: Functions as a core component within a collaborative multi-agent system for complex tasks.
- High Context Length: Natively supports an exceptionally long context window of 262,144 tokens, crucial for processing detailed long-video information.
- Strong Performance: Achieves 72% accuracy on the challenging LongTVQA+ test set, demonstrating reasoning capabilities comparable to larger, advanced closed-source models like
gpt-4o-mini (which scores 74% on the same benchmark).
Intended Use Cases
- Research: Ideal for academic research in long-video question answering and multi-agent systems.
- Reproducibility: Designed for reproducing experiments and results from the LongVideoAgent project.
- Agentic Reasoning Studies: Useful for studying how agentic reasoning can be applied to long video analysis.
This checkpoint is not a general-purpose video model but a specialized component for the LongVideoAgent framework. For full utilization, refer to the official LongVideoAgent repository and its paper on arXiv.