avery00/VideoExplorer-TemporalGrounder

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Oct 15, 2025License:otherArchitecture:Transformer Cold

The avery00/VideoExplorer-TemporalGrounder is a 7.6 billion parameter language model, fine-tuned from Qwen/Qwen2.5-7B, specifically optimized for temporal grounding tasks within videos. This model excels at identifying and locating specific events or actions in video content based on textual queries. It is designed for applications requiring precise temporal localization in video analysis.

Loading preview...

Model Overview

The avery00/VideoExplorer-TemporalGrounder is a 7.6 billion parameter language model, fine-tuned from the base Qwen/Qwen2.5-7B architecture. Its primary specialization is temporal grounding, a task focused on precisely locating specific events or actions within video content based on natural language descriptions.

Key Capabilities

  • Temporal Grounding: Optimized for identifying and localizing events within videos.
  • Fine-tuned Performance: Leverages the Qwen2.5-7B base model, enhanced through specific training on the deepseek_real_video_marathon_temporal_grounding_6k dataset.

Training Details

The model was trained with a learning rate of 1e-05, a total batch size of 256 across 8 devices, and utilized the paged_adamw_8bit optimizer. Training spanned 8 epochs with a cosine learning rate scheduler and a 0.1 warmup ratio. This configuration aims to achieve robust performance in temporal grounding tasks.

Good For

  • Applications requiring precise event localization in video.
  • Research and development in video understanding and temporal reasoning.