Exploring video content structure for hierarchical summarization

Xingquan Zhu*, Xindong Wu, Jianping Fan, Ahmed K. Elmagarmid, Walid G. Aref

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

73 Citations (Scopus)

Abstract

In this paper, we propose a hierarchical video summarization strategy that explores video content structure to provide the users with a scalable, multilevel video summary. First, video-shot- segmentation and keyframe-extraction algorithms are applied to parse video sequences into physical shots and discrete keyframes. Next, an affinity (self-correlation) matrix is constructed to merge visually similar shots into clusters (supergroups). Since video shots with high similarities do not necessarily imply that they belong to the same story unit, temporal information is adopted by merging temporally adjacent shots (within a specified distance) from the super-group into each video group. A video-scene-detection algorithm is thus proposed to merge temporally or spatially correlated video groups into scenario units. This is followed by a scene-clustering algorithm that eliminates visual redundancy among the units. A hierarchical video content structure with increasing granularity is constructed from the clustered scenes, video scenes, and video groups to keyframes. Finally, we introduce a hierarchical video summarization scheme by executing various approaches at different levels of the video content hierarchy to statically or dynamically construct the video summary. Extensive experiments based on real-world videos have been performed to validate the effectiveness of the proposed approach.

Original languageEnglish
Pages (from-to)98-115
Number of pages18
JournalMultimedia Systems
Volume10
Issue number2
DOIs
Publication statusPublished - Aug 2004
Externally publishedYes

Keywords

  • Hierarchical clustering
  • Hierarchical video summarization
  • Video content hierarchy
  • Video group detection
  • Video scene detection

Fingerprint

Dive into the research topics of 'Exploring video content structure for hierarchical summarization'. Together they form a unique fingerprint.

Cite this