================================================================================
Aspects dataset 1.0

Anestis Papazoglou, Luca Del Pero, Vittorio Ferrari
================================================================================

This dataset contains video shots for two different object classes: tigers and 
cars. We collected the shots from 188 car ads (~1–2 min each) and 14 nature
documentaries about tigers (~40 min), amounting to roughly 14 h of video. We
automatically partitioned these raw videos into shorter shots, and kept only 
those showing at least one instance of the class. This produced 806 shots for 
the car and 1880 for the tiger class, typically 1–100 sec in length.

We release individual video frames after decompression, in order to eliminate
possible confusion when decoding the videos and in the frame numbering. The
frames are stored in .jpg format. For each class, all frames from all shots are 
concatenated and named sequentially using 8 digits (e.g 00000001.jpg, 
00000002.jpg etc).

We also release the ground-truth aspect annotations as well as automatic 
segmentations used in [1]. The annotation files and further metadata about the 
videos are stored as .mat files and require matlab to parse.

================================================================================
1. Ranges
================================================================================

We provide a ranges.mat file for each class, specifying which frames belong to 
which shot. This contains an (N+1)x1 array, where N is the number of shots for 
the class. Each element i in the array denotes the index of first frame of the 
i-th shot.

We also provide a positiveRanges.mat file for each class, specifying which shots
contain an instance of the object class in at least one frame. This contains a 
Mx1 array, where M is the the number of shots containing the object class. Each 
element in the array corresponds tp the index of a shot containing the object 
class.

Note: We have not actually included shots that do not contain the object class 
in the archives for space considerations.

================================================================================
2. Ground-truth
================================================================================

The ground-truth annotations for each class are placed under the folder 
"GroundTruth/Aspect". The ground-truth for each annotated frame is stored in a 
separate file named annotationXXXXXXXX.mat, where XXXXXXXX is the 8 digit index 
of the corresponding frame.
 
Each file contains a structure with the following fields:

--------------------------------  Tiger class  ---------------------------------
          objects_num: The number of class objects in the frame.
          
               faceid: An index (0 to 6) indicating the orientation of the face.
                       These indexes correspond to 'not visible', 'left', 
                       'left-front', 'front', 'right-front', 'right', 'top'
                       respectively.
                 face: A string representation of the corresponding faceid.
         front_feetid: A value (0 to 4) indicating the action of the legs.
                       0 corresponds to 'not visible', 1 is 'standing', 2 is
                       'walking', 3 is 'running', 4 is 'lying'.
           front_feet: A string representation of the above value.
          back_feetid: A value (0 to 4) indicating the action of the legs.
                       0 corresponds to 'not visible', 1 is 'standing', 2 is
                       'walking', 3 is 'running', 4 is 'lying'.
            back_feet: A string representation of the above value.
            sternumid: A binary value indicating whether the part is visible.
              sternum: A string representation of the above value.
           buttocksid: A binary value indicating whether the part is visible.
             buttocks: A string representation of the above value.
    shoulder_bladesid: An index indicating which shoulderblade is visible.
                       0 corresponds to none, 1 to left and 2 to right.
      shoulder_blades: A string representation of the above value.
             thighsid: A binary value indicating whether the part is visible.
               thighs: A string representation of the above value.
               backid: A binary value indicating whether the part is visible.
                 back: A string representation of the above value.
              bellyid: A binary value indicating whether the part is visible.
                belly: A string representation of the above value.
       segmentationid: A value (0 to 2) indicating the quality of the provided
                       segmentation. 0 is wrong segmentation, 1 is mostly on the
                       object but inaccurate, 2 is perfect.
         segmentation: A string representation of the above value.
         
--------------------------------   Car class   ---------------------------------
          objects_num: The number of class objects in the frame.
          
                 roof: A binary value indicating whether the part is visible.
     windshield_front: A binary value indicating whether the part is visible.
      windshield_rear: A binary value indicating whether the part is visible.
     light_front_left: A binary value indicating whether the part is visible.
    light_front_right: A binary value indicating whether the part is visible.
      light_rear_left: A binary value indicating whether the part is visible.
     light_rear_right: A binary value indicating whether the part is visible.
     wheel_front_left: A binary value indicating whether the part is visible.
    wheel_front_right: A binary value indicating whether the part is visible.
      wheel_rear_left: A binary value indicating whether the part is visible.
     wheel_rear_right: A binary value indicating whether the part is visible.
      door_front_left: A binary value indicating whether the part is visible.
     door_front_right: A binary value indicating whether the part is visible.
       segmentationid: A value (0 to 2) indicating the quality of the provided
                       segmentation. 0 is wrong segmentation, 1 is mostly on the
                       object but inaccurate, 2 is perfect.
         segmentation: A string representation of the above value.

================================================================================
2. Segmentations
================================================================================

We provide segmentations produced by [2] for all shots that contain at least one
instance of the object class. The segmentation for each shot is stored as a file
named "segmentationShotX.mat", where X indicates the shot index.

Inside each file you will find a cell array of size Kx1, where K is the number 
frames in that shot. Each element in that cell array is the binary segmentation 
mask (1=foreground, 0=background) for the corresponding frame in the shot. For 
instance, the first element corresponds to the first frame of the shot etc.

================================================================================
References
================================================================================

[1] Discovering object aspects from video
Anestis Papazoglou, Luca Del Pero, Vittorio Ferrari
In Image and Vision Computing, August 2016

[2] Fast object segmentation in unconstrained video
Anestis Papazoglou, Vittorio Ferrari,
In International Conference on Computer Vision (ICCV), 2013

================================================================================
Support
================================================================================

For any query/suggestion/complaint please send us an email:

papazoglou.anestis@gmail.com