Abstract
This paper (CVPR 2018, spotlight presentation) considers the problem of estimating repetition in video, such as performing push-ups, cutting a melon or playing violin. Existing work shows good results under the assumption of static and stationary periodicity. As realistic video is rarely perfectly static and stationary, the often preferred Fourier-based measurements is inapt. Instead, we adopt the wavelet transform to better handle non-static and non-stationary video dynamics. From the flow field and its differentials, we derive three fundamental motion types and three motion continuities of intrinsic periodicity in 3D. On top of this, the 2D perception of 3D periodicity considers two extreme viewpoints. What follows are 18 fundamental cases of recurrent perception in 2D. In practice, to deal with the variety of repetitive appearance, our theory implies measuring time-varying flow and its differentials (gradient, divergence and curl) over segmented foreground motion. We construct a new dataset, better reflecting reality by including more non-static and non-stationary videos, and report experiments. On the task of counting repetitions in video, we obtain favorable results compared to a deep learning alternative.
Repetition Estimation
Visual repetition is ubiquitous in the world around us. It is present in activities like rowing, music-making and cooking. It arises in natural and urban environments: traffic patterns, blinking lights, and leaves in the wind. Rhythm and repetition are used to approximate velocity, estimate progress and to trigger attention. In computer vision, understanding repetition in video is important as it can serve action classification, action localization, human motion analysis, 3D reconstruction and camera calibration. Estimating repetition remains challenging. First and foremost, repetition appears in many forms due to its variety in motion pattern and motion continuity. The viewpoint is crucial for the perception of recurrence. In practice, camera motion makes repetition estimation inevitably hard.
We reconsider the theory of repetition starting from the divergence, gradient and curl operators acting on the 3D flow field. We derive three motion types and three motion continuities. What follows are 3×3 fundamental cases of intrinsic periodicity in 3D (see image below). For the 2D perception of 3D intrinsic periodicity, the observer's viewpoint can be somewhere in the continuous range between two viewpoint extremes. Ultimately, we distinguish 18 fundamental cases for the 2D perception of 3D intrinsic periodic motion.
Supplementary Material Video
QUVA Repetition Dataset
QUVA Repetition dataset consists of 100 videos displaying a wide variety of repetitive video dynamics, including swimming, stirring, cutting, combing and music-making. All videos have been annotated with individual cycle bounds and a total repetition count. The dataset will soon be available for download.
Click on a thumbnail to preview the video.
Citation
Runia, T.F.H., Snoek, C.G.M., & Smeulders, A.W.M. (2018). Real-World Repetition Estimation by Div, Grad and Curl. IEEE CVPR 2018.
Bibtex format:
@InProceedings{runia2018repetition,
title = {Real-World Repetition Estimation by Div, Grad and Curl},
author = {Runia, Tom F H and Snoek, Cees G M and Smeulders, Arnold W M},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2018}
}