1. I. Human Pose Estimation in Still Images 1.22
  2. II. Quick start - pose estimation given an input upper-body detection
  3. III Quick start - fully automatic upper-body detection and pose estimation
  4. IV. Installing the mex files
  5. V. Introduction
  6. VI. Full Body Support
  7. VII. Software Functions
  8. VIII. Pre-trained Model Parameters
  9. X. References
  10. XI. Version history


I. Human Pose Estimation in Still Images 1.22


Research by V. Ferrari, M. Eichner, M. Marin, and A. Zisserman,

supported by CLASS and SNSF.


Software by V. Ferrari and M. Eichner,

building on components by Deva Ramanan (image parsing [1]),

and by Varun Gulshan, Pushmeet Kohli and Vladimir Kolmogorov (GrabCut [6]).  


This code was developed under Linux with Matlab R2008b.

We have also successfully run it under Windows XP with Matlab R2008b (see 'Installing the mex files')

and under MacOSX 10.5 Leopard with Matlab R2008a.

There is no guarantee the code will run on other operating systems or Matlab versions.


back to the top

II. Quick start - pose estimation given an input upper-body detection


This release contains all you need to do pose estimation starting from a bounding-box around the upper-body of a person, which you must provide as input.

0. Let <dir> be the directory where you uncompressed the .tgz release archive.



1. Start Matlab



2. Execute the following commands:


cd <dir>/code

startup



3. If you are running the code for the fist time then execute:


installmex


if this causes problems then see the section 'Installing the mex files'.



4. Your matlab environment is now ready for running the pose estimator.

To run on the provided example execute:


cd ../example_data;

% upper-body parsing

[T sticks_imgcoor] = PoseEstimStillImage(pwd, 'images', '%06d.jpg', 0, 'ubf', [343 27 242 217]', fghigh_params, parse_params_Buffy3and4andPascal, [], pm2segms_params, true);

% full-body parsing

[T sticks_imgcoor] = PoseEstimStillImage(pwd, 'images', '%06d.jpg', 1, 'full', [119 70 158 142]', fghigh_params, parse_params_Buffy3and4andPascal, [], pm2segms_params, true);


This should produce output in <dir>/example_data/segms_ubf/000000.jpg and <dir>/example_data/segms_full/000001.jpg

If this output matches exactly <dir>/000000_stickman.jpg and <dir>/000001_stickman.jpg, then the pose estimator is working perfectly.


The coordinates [343 27 242 217] and [119 70 158 142] represent the input upper-body detections.


back to the top



III Quick start - fully automatic upper-body detection and pose estimation


This release can be used in conjunction with our upper-body detector [5] to give a fully automatic pipeline taking just an image as input.

0. Let <dir> be the directory where you uncompressed the .tgz release archive of the pose estimator.


1. Install the Upper Body Detector [5], following the quick start instructions within its readme


2. Start Matlab


3. Execute the following commands:


cd <dir>/code

startup



4. If you are running the code for the fist time then execute:


installmex


if this causes problems then see the section 'Installing the mex files'.



5. At this point both upper body detector and human pose estimation software modules should be ready
(i.e. all parameters should be loaded, all files should be mexed, all paths should be set);
if the step 6 below results in an error it means that something has not been set.


6. Execute
DetectAndEstimDir('<dir>/example_data/images','replace_me_with_path_to_upperbody_detector/code/pff_model_upperbody_final.mat','replace_me_with_path_to_upperbody_detector/code/haarcascade_frontalface_alt2.xml',det_pars,'ubf',fghigh_params,parse_params_Buffy3and4andPascal,[],pm2segms_params,1);

this will process all images in directory <dir>/example_data/images and produce the following output in the same directory:
- dets/ : directory containing person detections
- fghigh_classname/ poses_classname/ segms_classname/ : directories containing foreground highlighting [2], pose maps, and output stickmen overlays respectively for all images in the directory. These overlays are provided as visualizations of what the algorithm does.
- a single file for each detection and image in the format: $imagename_$detnum$imgextension 
- $imagename$imgextension$_pms.mat - results for all detections per image (details of its contents below)

----------------------------------------------------------------------------------------
For estimating full bodies use 'full' instead of 'ubf' when running DetectAndEstimDir.
You should not change the person detector. DetectAndEstimDir automatically derives a
full body detection window from the upper-body detections [5].
----------------------------------------------------------------------------------------

back to the top

IV. Installing the mex files


In case of problems with the installmex.m script during compilation of 'mexDGC.cpp' or 'nema_lognorm_fast.cxx' try:


1) change the mex compiler using the command: mex -setup

   and select a new compiler from the on-screen options.

   Under Windows XP this code should compile using Visual C++ 2008 Express Edition.


2) if 1) doesn't help then switch off the foreground highlighting stage by setting

   parse_params_Buffy3and4andPascal.use_fg_high = false;


unfortunately, if 'triLinearInterpolation.cpp','triLinearVoting.cpp' or 'vgg_nearest_neighbour_dist.cxx'

are not successfully compiled, you will not be able to run our software.


back to the top

V. Introduction

We release here software for articulated human pose estimation in still images. Our algorithm [9] is designed to operate in uncontrolled images with difficult illumination conditions and cluttered backgrounds. People can appear at any location and scale in the image, and can wear any kind of clothing, in any colour/texture. The only assumption the algorithm makes is that people are upright (i.e. their head is above their torso) and they are seen approximately from a frontal viewpoint.


The input to this software is an image and a bounding-box around the head and shoulders of a person in the image. This window can be obtained by using our upper-body detector [5], or by any other means. The combination of such a generic, person-independent detector and our software allow for fully automatic pose estimation in uncontrolled images, without knowing the location, scale, or appearance (clothing, skin color) of the person, and without the need for background subtraction. The output of our system is a set of line segments indicating location, size and orientation of the body parts (stickmen).


In this release both upper bodies and full bodies are fully supported (near-frontal or near-back views).


v1.22:

This release matches our latest version reported in [9] and supports full-bodies (see section VI). As we noticed  only a marginal influence of the repulsive model [3], we removed it. Until v1.05 the software returned T.PM.sticks, a stickman derived from the posterior marginals of the model. In this release, the software also outputs a second stickman in T.PM.MAP.sticks, corresponding to the approximate MAP of the model. The MAP stickman always has arms of the same length wrt to the detection window, whereas the marginal stickman adapts to the image data. On the other hand, the MAP stickman is always well-formed, respecting the kinematic constraints of the human body, whereas the marginal stickman can occasionally break them.


This release is designed to be used in conjunction with our upper-body detector [5] to give a fully automatic pipeline taking just an image as input. After installing [5], you can directly run the full pipeline on an image directory by calling a single Matlab function (DetectAndEstimDir.m, see section III).


back to the top


VI. Full Body Support

This release supports full body pose estimation of near-frontal and near-back views. The algorithm estimates the pose of 10 body parts (left/right upper/lower legs have been added). Changing the classname parameter from 'ubf' to 'full' when calling the algorithm effectively switches from upper-body to full-body mode. No other changes are necessary, as the full-body mode also inputs upper-body detection windows [5].

For this release we did not train our procedure [4] for obtaining person-specific color models for legs. However, this is not a problem, the software runs properly without it. Legs are estimated using the original iterative parsing technique of [1]. Please ignore the following warnings in the output: 
WARNING: empty color model of limb class 3
WARNING: empty color model of limb class 5

back to the top


VII. Software Functions

DetectAndEstimDir

-----------------

This is the top level function you should call to run the complete, fully automatic human detection and pose estimation pipeline on a directory with images and stores the results for each image separately. It requires our upper body detector [5] to be installed and accessible through the Matlab path.


DetectAndEstimDir(img_dir, pffubfmodel_path, facemodel_path, det_pars,classname, fghigh_pars,parse_pars, [], segm_pars, verbose)


Input:

img_dir - path to the directory containing images

pffubfmodel_path - relative/absolute path to the pretrained upper body part-based model

facemodel_path - (optional) relative/absolute path to the pretrained opencv face model (xml file)
                 if [] then skip face detection

det_pars - detection parameters provided with the detector

...

remaining parameters - look into the description of PoseEstimStillImage



PoseEstimStillImage

-------------------

This is the function you should call if you already have an upper-body detection bounding-box (e.g. from [5]).


[T sticks_imgcoor] = PoseEstimStillImage(base_dir, img_dir, img_fname_format, img_number, classname, bb, fghigh_pars, parse_pars, [], verbose)


Input:
base_dir - directory where the results should be stored

img_dir - name of the directory with images relative to the base_dir

img_fname_format - format of the image files in the img_dir, e.g. %06d.jpg means that images are in the format xxxxxx.jpg (where x is a digit)

img_number - image number, used together with img_fname_format to determine the filename of the image to process

classname - either 'ubf', or 'full' - used as a selector for proper parameters.

            'ubf' = upper body front/back, ' 'full' = full body front/back

bb - window around object of interest [x y width height]';

     for classname == 'ubf' and 'full' use the output from CALVIN upper-body detector [5] (bounding box to be around the head and shoulders (see [2-4] for examples)

fghight_pars - parameters of the foreground highlighting algorithm (used if it is switched on)

parse_params - parameters of the parsing algorithm

segm_params - parameters of the routine that derives hard segmentations and stickmen from posterior marginals (called 'pose maps' in our code)

verbose - 0 = no output, 1 = text output, 2 = displaying intermediate figures


Output:


sticks_imgcoor : sticks coordinates automatically estimated form the posterior marginals (see [2]):

                 sticks are formatted in the following way: sticks(:,n) = [x1 y1 x2 y2]' where n denotes the body part:

                 'ubf' mode

                 1 - torso, 2 - left upper arm, 3 - right upper arm, 4 - left lower arm, 5 - right lower arm, 6 - head

                 'full' mode

                 1 - torso, 2 - left upper arm, 3 - right upper arm, 4 - left upper leg, 5 - right upper leg, 

                 6 - left lower arm, 7 - right lower arm, 8 - left lower leg, 9 - right lower leg, 10 - head

                 sticks coordinates are in the image coordinate system.


T.PM.sticks : as sticks_imgcoor but sticks coordinates are relative to the pose map (T.PM.respIm) coordinate frame

T.D : detection window, as a column vector [imgnumber,bbx,bby,bbwidth,bbheight,0,0,0,classid]'

T.CM : estimated color models

T.PM : compressed pose map structure, decompress with UncompressPM before using        

T.PM.respIm : decompress using UncompressRespIm before using.

              it contains posterior marginal probabilities for each body part over the search space (y,x,theta).

              T.PM.respIm(y,x,theta,p) =

              It is a 4-D array with dimensions Y,X (spatial location) x Theta (orientation) x P (body part).

              The Y,X coordinates are expressed in the coordinate frame of the detection window,

              after enlarging as described in [2] and rescaling to a fixed dimension.

              The (y,x) coordinates refer to the center of the part.

              See ShowRespIm.m for more details.

T.PM.a : estimated pose visualization, body part classes are distributed over the color planes (see [1-4])

T.PM.b : soft-segmentation masks derived from .respIm, by convolving it with rectangles representing the body parts. 

         See ShowParseResult.m for details.

T.PM.e : total pose entropy, it is a measure of confidence in the estimated pose

T.PM.p : total pixel confidence

T.PM.bb : bounding box of the enlarged area derived from the detection window [x y width height], so as to cover the whole object

          (e.g. the whole upper-body, rather than only the head and shoulders)

T.PM.MAP.sticks - approx. MAP stickman (alternative to T.PM.sticks) in the same order as sticks_imgcoor but coordinates relative to the pose map 


After successful software execution, several output folders are created in base_dir:


fghigh_classname (only if parse_params.use_fg_high == true):

processed images with highlighted foreground area, together with detection window and enlarged detection window (see [2]).


poses_classname:

processed images with overlaid posterior marginals and approx. MAP stickman obtained by parsing with our model.


segms_classname:

visualization of the estimated hard segmentations of posterior marginals and stickmen derived from them.



ShowRespIm

----------

ShowRespIm(respIm)


Displays the posterior marginal probabilities for each body part over the search space (y,x,theta) (e.g. ShowRespIm(UncompressRespIm(T.PM.respIm))



ShowParseResults

----------------

ShowParseResult(pm, show_whole, show_parts)


Displays the estimated pose visualization (body part classes are distributed over the color planes) and soft-segmentation masks derived from .respIm, by convolving it with rectangles representing the body parts.


back to the top

VIII. Pre-trained Model Parameters

The Location Priors and Appearance Transfer mechanism [4] used during pose estimation, require a training stage. We provide here 3 sets of models, which have been trained on different datasets:


- Buffy2to6andPascal

  trained on all 5 episodes from the Buffy Stickmen dataset [7] and the whole ETHZ Pascal Stickmen dataset [8].

  This model is suitable for pose estimation on any dataset except for quantitative evaluation on ETHZ Pascal Stickmen and Buffy Stickmen

  (since these were used for training).


- Buffy2to6

  trained on all 5 episodes of the Buffy Stickmen dataset.

  This model is suitable for evaluation on the ETHZ Pascal Stickmen dataset, or on any other dataset except Buffy Stickmen.


- Buffy3and4andPascal

  trained on episodes 3 and 4 from Buffy Stickmen and the whole of ETHZ Pascal Stickmen.

  This model is suitable for evaluation on episodes 2+5+6 from the Buffy Stickmen dataset, which

  form the official test set as evaluated in [2,3,4,9]. This model is of course also suitable for evaluation on any other dataset.


Models are provided as three distinct parameter files: parse_params_{Buffy2to6andPascal,Buffy2to6,Buffy3and4andPascal}.

Manually altering the parameter values might result in unpredictable results. We recommend using the parameters provided.



back to the top



IX. Support

For support please contact us:


eichner@vision.ee.ethz.ch

ferrari@vision.ee.ethz.ch


Have fun!


back to the top

X. References


[1] D. Ramanan.

    Learning to parse images of articulated bodies

    NIPS 2006.


[2] V. Ferrari, M. Marin-Jimenez, A. Zisserman

    Progressive search space reduction for human pose estimation

    CVPR 2008.


[3] V. Ferrari, M. Marin-Jimenez, A. Zisserman

    Pose Search: retrieving people using their pose

    CVPR 2009.


[4] M.Eichner, V.Ferrari

    Better appearance models for pictorial structures

    BMVC 2009.


[5] M.Eichner, V.Ferrari

    CALVIN Upper-body detector

    http://www.vision.ee.ethz.ch/~calvin/calvin_upperbody_detector/


[6] C. Rother, V. Kolmogorov, and A. Blake

    Grabcut - interactive foreground extraction using iterated graph cuts.

    Siggraph 2004.


[7] http://www.robots.ox.ac.uk/~vgg/data/stickmen/


[8] http://www.vision.ee.ethz.ch/~calvin/ethz_pascal_stickmen/


[9] M.Eichner, M. Marin-Jimenez, A. Zisserman, V.Ferrari 

    Articulated Human Pose Estimation and Search in (Almost) Unconstrained Still Images
    ETH Zurich, D-ITET, BIWI, Technical Report No.272, September 2010.


back to the top

XI. Version history

1.22
---
- pm2segms updated to work with recent matlab versions (watershed matlab function now returns uint8)

1.21
---
- full body parsing quick-start added
- class-specific output directories

1.2
---
- system from [9]
- full-body parsing support added
- repulsive model removed
- alternative approximate MAP stickman output


1.05
----
- DetectAndEstimDir - stores images with the detection overlays
- DetectAndEstimDir - fixed bug in results saving


1.04
----
- DetectAndEstimDir function added, allowing to easily interface with the upper-body detector [4]
  and to run the full pipeline over all images in a directory
- text readme replaced with this html version


1.03

----

- initial public release

best system from [4] (see section 6, page 10 in the paper) with a few minor corrections (below).

As described in [4], this release includes a procedure for obtaining person-specific body part color models

before running pictorial structures inference. Therefore the initial parsing stage of [1,2,3], which used just edges, is skipped. This release also include the refinements to the standard pictorial structure model which we presented in [3] (e.g. the repulsive graph edges to reduce double-counting). For further details please refer to [1,2,3,4].

Finally, this release differs slightly from the best one in [4], because of the following corrections:

    - additional adjustments of the kinematic prior after the rescaling correction [4]

    - bug fix in message passing inherited from [1]


- code clean up

- 3 pretrained LPAT models [4]




1.02

----

- initial friends release

- this README

- structure of the parameters cleaned up

- system brought to the state presented in BMVC 2009

  (including full color models via appearance transfer between parts, and better rescaling)

- fixed bug in message passing from [1]

- kinematic priors adjustments after rescaling corrections



1.01

----

- color models via body part location priors and appearance transfer mechanism



1.0

---

- initial internal release


back to the top