Idiap Research Institute
ETH Zurich, CALVIN group

Idiap/ETHZ Faces and Poses Dataset


A corpus of news items for automatic face and pose annotation.



Luo Jie,    Barbara Caputo,    Vittorio Ferrari



Overview

Annotated example

This dataset contains 1703 image-caption pairs, first used [1]. Captions contain the names of some of the persons appearing in the corresponding image, as well as verbs indicating what they are doing. The images were collected by querying Google Images using query keywords generated by combining different names (sport stars and politicians) and verbs (from sports and social interactions). The captions are derived from the snippet of text returned by google-images and typically mention the action of at least one person in the image as well as names/verbs not appearing in the image.
In addition to the image-caption pairs, this release also includes:


Important Notice

These images were downloaded from the internet, and may subject to copyright. We don't own the copyright of the images and only provide them for non-commerical research purposes.


Downloads

FilenameDescriptionRelease DateSize
data.tar.gz Dataset of images and captions in text format (including ground-truth person locations) 23 April 2010 52.8  MB
captions.mat Captions, automatically extracted name-verb pairs, and ground-truth name-verb pairs and locations in MAT-File format. 23 April 2010 260.2  KB
bbx.mat Detected face and upper-body bounding-boxes in MAT-File format. 23 April 2010 129.1  KB
dictionary.mat   A list of frequent names and verbs considered in [1]. 23 April 2010 2.6  KB
README.txt Description of contents. 23 April 2010 6.1  KB

Related Publications and Softwares

[1] L. Jie, B. Caputo and V. Ferrari.
     Who's Doing What: Joint Modeling of Names and Verbs for Simultaneous Face and Pose Annotation
     In Advances in Neural Information Processing Systems 22 (NIPS), 2009.

[2] K. Deschacht and M.-F. Moens.
    
Semi-supervised semantic role labeling using the latent words language model
     In proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2009.

[3] http://torch3vision.idiap.ch/

[4] http://www.robots.ox.ac.uk/~vgg/software/UpperBody/


Acknowledgements

This work is funded the Swiss National Science Foundation SNSF

Please report problems with this page to
Luo Jie   or    Vittorio Ferrari
Last updated 24th April 2010.