longvideoagent/longvideoagent-qwen3-4b

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Apr 3, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

LongVideoAgent Qwen3-4B is a 4 billion parameter language model checkpoint based on Qwen3, developed for long-video question answering within the LongVideoAgent multi-agent framework. It is specifically designed to decompose and reason over complex long-video content. This model achieves 72% accuracy on the LongTVQA+ test set, demonstrating strong performance comparable to larger closed-source models for specialized video reasoning tasks. It supports an impressive native context length of 262,144 tokens, optimized for processing extensive video transcripts and related data.

Loading preview...

LongVideoAgent Qwen3-4B: Specialized for Long-Video QA

LongVideoAgent Qwen3-4B is a 4 billion parameter language model checkpoint, built upon the Qwen3 architecture. It is specifically developed as part of the LongVideoAgent multi-agent framework, which is designed to tackle complex question answering over long video content by decomposing reasoning into specialized roles. This model is intended for use within the official LongVideoAgent codebase and its evaluation pipeline.

Key Capabilities & Performance

  • Long-Video Question Answering: Optimized for reasoning and extracting information from extensive video data.
  • Multi-Agent Framework: Functions as a core component within a collaborative multi-agent system for complex tasks.
  • High Context Length: Natively supports an exceptionally long context window of 262,144 tokens, crucial for processing detailed long-video information.
  • Strong Performance: Achieves 72% accuracy on the challenging LongTVQA+ test set, demonstrating reasoning capabilities comparable to larger, advanced closed-source models like gpt-4o-mini (which scores 74% on the same benchmark).

Intended Use Cases

  • Research: Ideal for academic research in long-video question answering and multi-agent systems.
  • Reproducibility: Designed for reproducing experiments and results from the LongVideoAgent project.
  • Agentic Reasoning Studies: Useful for studying how agentic reasoning can be applied to long video analysis.

This checkpoint is not a general-purpose video model but a specialized component for the LongVideoAgent framework. For full utilization, refer to the official LongVideoAgent repository and its paper on arXiv.