avery00/VideoExplorer-TemporalGrounder
The avery00/VideoExplorer-TemporalGrounder is a 7.6 billion parameter language model, fine-tuned from Qwen/Qwen2.5-7B, specifically optimized for temporal grounding tasks within videos. This model excels at identifying and locating specific events or actions in video content based on textual queries. It is designed for applications requiring precise temporal localization in video analysis.
Loading preview...
Model Overview
The avery00/VideoExplorer-TemporalGrounder is a 7.6 billion parameter language model, fine-tuned from the base Qwen/Qwen2.5-7B architecture. Its primary specialization is temporal grounding, a task focused on precisely locating specific events or actions within video content based on natural language descriptions.
Key Capabilities
- Temporal Grounding: Optimized for identifying and localizing events within videos.
- Fine-tuned Performance: Leverages the Qwen2.5-7B base model, enhanced through specific training on the
deepseek_real_video_marathon_temporal_grounding_6kdataset.
Training Details
The model was trained with a learning rate of 1e-05, a total batch size of 256 across 8 devices, and utilized the paged_adamw_8bit optimizer. Training spanned 8 epochs with a cosine learning rate scheduler and a 0.1 warmup ratio. This configuration aims to achieve robust performance in temporal grounding tasks.
Good For
- Applications requiring precise event localization in video.
- Research and development in video understanding and temporal reasoning.