================================================================================ Video alignment dataset 1.0 Anestis Papazoglou, Luca Del Pero, Vittorio Ferrari ================================================================================ This dataset contains 22 video sequences depicting cars on racing sequences collected from YouTube, each 5-30 seconds long. We also provide viewpoint annotations for each frame, which we annotated as follows. We first define a set of 16 canonical viewpoints, spaced by 22.5 degrees (starting from full frontal). We manually annotate all the frames showing one of them. Then, we automatically annotated the rest of the frames by linearly interpolating the manual annotations. We release individual video frames after decompression, in order to eliminate possible confusion when decoding the videos and in the frame numbering. The frames are stored in .jpg format. For each class, all frames from all shots are concatenated and named sequentially using 8 digits (e.g 00000001.jpg, 00000002.jpg etc). We also release the ground-truth viewpoint annotations as well as the automatic segmentations used in [1]. The annotation files and further metadata about the videos are stored as .mat files and require matlab to parse. ================================================================================ 1. Ranges ================================================================================ We provide a ranges.mat file for each class, specifying which frames belong to which shot. This contains an (N+1)x1 array, where N is the number of shots for the class. Each element i in the array denotes the index of first frame of the i-th shot. We also provide a positiveRanges.mat file for each class, specifying which shots contain an instance of the object class in at least one frame. This contains a Mx1 array, where M is the the number of shots containing the object class. Each element in the array corresponds tp the index of a shot containing the object class. Note: We have not actually included shots that do not contain the object class in the archives for space considerations. ================================================================================ 2. Ground-truth ================================================================================ The ground-truth annotations for each class are placed under the folder "GroundTruth/Viewpoint". The ground-truth for each annotated frame is stored in a separate file named annotationXXXXXXXX.mat, where XXXXXXXX is the 8 digit index of the corresponding frame. Each file contains a structure with a single field ("yaw") that contains the viewpoint annotation of the object in radians. If "yaw" contains 'NaN', that indicates that the object is not visible in that particular frame. ================================================================================ 2. Segmentations ================================================================================ We provide segmentations produced by the method described in [1] for all shots that contain at least one instance of the object class. The segmentation for each shot is stored as a file named "segmentationShotX.mat", where X indicates the shot index. Inside each file you will find a cell array of size Kx1, where K is the number frames in that shot. Each element in that cell array is the binary segmentation mask (1=foreground, 0=background) for the corresponding frame in the shot. For instance, the first element corresponds to the first frame of the shot etc. ================================================================================ References ================================================================================ [1] Video temporal alignment for object viewpoint Anestis Papazoglou, Luca Del Pero, Vittorio Ferrari ACCV, 2016 ================================================================================ Support ================================================================================ For any query/suggestion/complaint please send us an email: papazoglou.anestis@gmail.com