Idiap/ETHZ Faces and Poses Dataset
A corpus of news items for automatic face and pose annotation.
Luo Jie,    Barbara Caputo,    Vittorio Ferrari
Overview
This dataset contains 1703 image-caption pairs, first used [1].
Captions contain the names of some of the persons appearing in the corresponding image, as well as verbs indicating what they are doing.
The images were collected by querying Google Images using query keywords generated by combining different names (sport stars and politicians) and verbs (from sports and social interactions).
The captions are derived from the snippet of text returned by google-images
and typically mention the action of at least one person in the image as well as names/verbs not appearing in the image.
In addition to the image-caption pairs, this release also includes:
- ground-truth associations between names and verbs in the captions
- ground-truth lists of which names from the caption appear in the images
- ground-truth locations of the persons in the images
- name-verb pairs extracted automatically from the captions using [2]
- face and upper-body bounding-boxes detecting using [3,4]. These are included to facilitate a direct comparison to our results.
Important Notice
These images were downloaded from the internet, and may subject to copyright. We don't own the copyright of the images and only provide them for non-commerical research purposes.
Downloads
Filename | Description | Release Date | Size |
---|---|---|---|
data.tar.gz | Dataset of images and captions in text format (including ground-truth person locations) | 23 April 2010 | 52.8  MB |
captions.mat | Captions, automatically extracted name-verb pairs, and ground-truth name-verb pairs and locations in MAT-File format. | 23 April 2010 | 260.2  KB |
bbx.mat | Detected face and upper-body bounding-boxes in MAT-File format. | 23 April 2010 | 129.1  KB |
dictionary.mat   | A list of frequent names and verbs considered in [1]. | 23 April 2010 | 2.6  KB |
README.txt | Description of contents. | 23 April 2010 | 6.1  KB |
Related Publications and Softwares
[1] L. Jie, B. Caputo and V. Ferrari.
    
Who's Doing What: Joint Modeling of Names and Verbs for Simultaneous Face and Pose Annotation
    
In Advances in Neural Information Processing Systems 22 (NIPS), 2009.
[2] K. Deschacht and M.-F. Moens.
    
Semi-supervised semantic role labeling using the latent words language model
    
In proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2009.
[3] http://torch3vision.idiap.ch/
[4]
http://www.robots.ox.ac.uk/~vgg/software/UpperBody/
Acknowledgements
This work is funded the Swiss National Science Foundation SNSF