Research by V. Ferrari, M. Eichner, M. Marin, and A. Zisserman,
supported by CLASS and SNSF.
Software by V. Ferrari and M. Eichner,
building on components by Deva Ramanan (image parsing [1]),
and by Varun Gulshan, Pushmeet Kohli and Vladimir Kolmogorov (GrabCut [6]).
This code was developed under Linux with Matlab R2008b.
We have also successfully run it under Windows XP with Matlab R2008b (see 'Installing the mex files')
and under MacOSX 10.5 Leopard with Matlab R2008a.
There is no guarantee the code will run on other operating systems or Matlab versions.
0. Let <dir> be the directory where you uncompressed the .tgz release archive.
1. Start Matlab
2. Execute the following commands:
cd <dir>/code
startup
3. If you are running the code for the fist time then execute:
installmex
if this causes problems then see the section 'Installing the mex files'.
4. Your matlab environment is now ready for running the pose estimator.
To run on the provided example execute:
cd ../example_data;
% upper-body parsing
[T sticks_imgcoor] = PoseEstimStillImage(pwd, 'images', '%06d.jpg', 0, 'ubf', [343 27 242 217]', fghigh_params, parse_params_Buffy3and4andPascal, [], pm2segms_params, true);
% full-body parsing
[T sticks_imgcoor] = PoseEstimStillImage(pwd, 'images', '%06d.jpg', 1, 'full', [119 70 158 142]', fghigh_params, parse_params_Buffy3and4andPascal, [], pm2segms_params, true);
This should produce output in <dir>/example_data/segms_ubf/000000.jpg and <dir>/example_data/segms_full/000001.jpg
If this output matches exactly <dir>/000000_stickman.jpg and <dir>/000001_stickman.jpg, then the pose estimator is working perfectly.
The coordinates [343 27 242 217] and [119 70 158 142] represent the input upper-body detections.
3. Execute the following commands:
cd <dir>/code
startup
4. If you are running the code for the fist time then execute:
installmex
if this causes problems then see the section 'Installing the mex files'.
In case of problems with the installmex.m script during compilation of 'mexDGC.cpp' or 'nema_lognorm_fast.cxx' try:
1) change the mex compiler using the command: mex -setup
and select a new compiler from the on-screen options.
Under Windows XP this code should compile using Visual C++ 2008 Express Edition.
2) if 1) doesn't help then switch off the foreground highlighting stage by setting
parse_params_Buffy3and4andPascal.use_fg_high = false;
unfortunately, if 'triLinearInterpolation.cpp','triLinearVoting.cpp' or 'vgg_nearest_neighbour_dist.cxx'
are not successfully compiled, you will not be able to run our software.
We release here software for articulated human pose estimation in still images. Our algorithm [9] is designed to operate in uncontrolled images with difficult illumination conditions and cluttered backgrounds. People can appear at any location and scale in the image, and can wear any kind of clothing, in any colour/texture. The only assumption the algorithm makes is that people are upright (i.e. their head is above their torso) and they are seen approximately from a frontal viewpoint.
The input to this software is an image and a bounding-box around the head and shoulders of a person in the image. This window can be obtained by using our upper-body detector [5], or by any other means. The combination of such a generic, person-independent detector and our software allow for fully automatic pose estimation in uncontrolled images, without knowing the location, scale, or appearance (clothing, skin color) of the person, and without the need for background subtraction. The output of our system is a set of line segments indicating location, size and orientation of the body parts (stickmen).
In this release both upper bodies and full bodies are fully supported (near-frontal or near-back views).
v1.22:
This release matches our latest version reported in [9] and supports full-bodies (see section VI). As we noticed only a marginal influence of the repulsive model [3], we removed it. Until v1.05 the software returned T.PM.sticks, a stickman derived from the posterior marginals of the model. In this release, the software also outputs a second stickman in T.PM.MAP.sticks, corresponding to the approximate MAP of the model. The MAP stickman always has arms of the same length wrt to the detection window, whereas the marginal stickman adapts to the image data. On the other hand, the MAP stickman is always well-formed, respecting the kinematic constraints of the human body, whereas the marginal stickman can occasionally break them.
This release is designed to be used in conjunction with our upper-body detector [5] to give a fully automatic pipeline taking just an image as input. After installing [5], you can directly run the full pipeline on an image directory by calling a single Matlab function (DetectAndEstimDir.m, see section III).
DetectAndEstimDir
-----------------
This is the top level function you should call to run the complete, fully automatic human detection and pose estimation pipeline on a directory with images and stores the results for each image separately. It requires our upper body detector [5] to be installed and accessible through the Matlab path.
DetectAndEstimDir(img_dir, pffubfmodel_path, facemodel_path, det_pars,classname, fghigh_pars,parse_pars, [], segm_pars, verbose)
Input:
img_dir - path to the directory containing images
pffubfmodel_path - relative/absolute path to the pretrained upper body part-based model
facemodel_path - (optional) relative/absolute path to the pretrained opencv face model (xml file)
if [] then skip face detection
det_pars - detection parameters provided with the detector
...
remaining parameters - look into the description of PoseEstimStillImage
PoseEstimStillImage
-------------------
This is the function you should call if you already have an upper-body detection bounding-box (e.g. from [5]).
[T sticks_imgcoor] = PoseEstimStillImage(base_dir, img_dir, img_fname_format, img_number, classname, bb, fghigh_pars, parse_pars, [], verbose)
Input:
base_dir - directory where the results should be stored
img_dir - name of the directory with images relative to the base_dir
img_fname_format - format of the image files in the img_dir, e.g. %06d.jpg means that images are in the format xxxxxx.jpg (where x is a digit)
img_number - image number, used together with img_fname_format to determine the filename of the image to process
classname - either 'ubf', or 'full' - used as a selector for proper parameters.
'ubf' = upper body front/back, ' 'full' = full body front/back
bb - window around object of interest [x y width height]';
for classname == 'ubf' and 'full' use the output from CALVIN upper-body detector [5] (bounding box to be around the head and shoulders (see [2-4] for examples)
fghight_pars - parameters of the foreground highlighting algorithm (used if it is switched on)
parse_params - parameters of the parsing algorithm
segm_params - parameters of the routine that derives hard segmentations and stickmen from posterior marginals (called 'pose maps' in our code)
verbose - 0 = no output, 1 = text output, 2 = displaying intermediate figures
Output:
sticks_imgcoor : sticks coordinates automatically estimated form the posterior marginals (see [2]):
sticks are formatted in the following way: sticks(:,n) = [x1 y1 x2 y2]' where n denotes the body part:
'ubf' mode
1 - torso, 2 - left upper arm, 3 - right upper arm, 4 - left lower arm, 5 - right lower arm, 6 - head
'full' mode
1 - torso, 2 - left upper arm, 3 - right upper arm, 4 - left upper leg, 5 - right upper leg,
6 - left lower arm, 7 - right lower arm, 8 - left lower leg, 9 - right lower leg, 10 - head
sticks coordinates are in the image coordinate system.
T.PM.sticks : as sticks_imgcoor but sticks coordinates are relative to the pose map (T.PM.respIm) coordinate frame
T.D : detection window, as a column vector [imgnumber,bbx,bby,bbwidth,bbheight,0,0,0,classid]'
T.CM : estimated color models
T.PM : compressed pose map structure, decompress with UncompressPM before using
T.PM.respIm : decompress using UncompressRespIm before using.
it contains posterior marginal probabilities for each body part over the search space (y,x,theta).
T.PM.respIm(y,x,theta,p) =
It is a 4-D array with dimensions Y,X (spatial location) x Theta (orientation) x P (body part).
The Y,X coordinates are expressed in the coordinate frame of the detection window,
after enlarging as described in [2] and rescaling to a fixed dimension.
The (y,x) coordinates refer to the center of the part.
See ShowRespIm.m for more details.
T.PM.a : estimated pose visualization, body part classes are distributed over the color planes (see [1-4])
T.PM.b : soft-segmentation masks derived from .respIm, by convolving it with rectangles representing the body parts.
See ShowParseResult.m for details.
T.PM.e : total pose entropy, it is a measure of confidence in the estimated pose
T.PM.p : total pixel confidence
T.PM.bb : bounding box of the enlarged area derived from the detection window [x y width height], so as to cover the whole object
(e.g. the whole upper-body, rather than only the head and shoulders)
T.PM.MAP.sticks - approx. MAP stickman (alternative to T.PM.sticks) in the same order as sticks_imgcoor but coordinates relative to the pose map
After successful software execution, several output folders are created in base_dir:
fghigh_classname (only if parse_params.use_fg_high == true):
processed images with highlighted foreground area, together with detection window and enlarged detection window (see [2]).
poses_classname:
processed images with overlaid posterior marginals and approx. MAP stickman obtained by parsing with our model.
segms_classname:
visualization of the estimated hard segmentations of posterior marginals and stickmen derived from them.
ShowRespIm
----------
ShowRespIm(respIm)
Displays the posterior marginal probabilities for each body part over the search space (y,x,theta) (e.g. ShowRespIm(UncompressRespIm(T.PM.respIm))
ShowParseResults
----------------
ShowParseResult(pm, show_whole, show_parts)
Displays the estimated pose visualization (body part classes are distributed over the color planes) and soft-segmentation masks derived from .respIm, by convolving it with rectangles representing the body parts.
The Location Priors and Appearance Transfer mechanism [4] used during pose estimation, require a training stage. We provide here 3 sets of models, which have been trained on different datasets:
- Buffy2to6andPascal
trained on all 5 episodes from the Buffy Stickmen dataset [7] and the whole ETHZ Pascal Stickmen dataset [8].
This model is suitable for pose estimation on any dataset except for quantitative evaluation on ETHZ Pascal Stickmen and Buffy Stickmen
(since these were used for training).
- Buffy2to6
trained on all 5 episodes of the Buffy Stickmen dataset.
This model is suitable for evaluation on the ETHZ Pascal Stickmen dataset, or on any other dataset except Buffy Stickmen.
- Buffy3and4andPascal
trained on episodes 3 and 4 from Buffy Stickmen and the whole of ETHZ Pascal Stickmen.
This model is suitable for evaluation on episodes 2+5+6 from the Buffy Stickmen dataset, which
form the official test set as evaluated in [2,3,4,9]. This model is of course also suitable for evaluation on any other dataset.
Manually altering the parameter values might result in unpredictable results. We recommend using the parameters provided.
For support please contact us:
eichner@vision.ee.ethz.ch
ferrari@vision.ee.ethz.ch
Have fun!
[1] D. Ramanan.
Learning to parse images of articulated bodies
NIPS 2006.
[2] V. Ferrari, M. Marin-Jimenez, A. Zisserman
Progressive search space reduction for human pose estimation
CVPR 2008.
[3] V. Ferrari, M. Marin-Jimenez, A. Zisserman
Pose Search: retrieving people using their pose
CVPR 2009.
[4] M.Eichner, V.Ferrari
Better appearance models for pictorial structures
BMVC 2009.
[5] M.Eichner, V.Ferrari
CALVIN Upper-body detector
http://www.vision.ee.ethz.ch/~calvin/calvin_upperbody_detector/
[6] C. Rother, V. Kolmogorov, and A. Blake
Grabcut - interactive foreground extraction using iterated graph cuts.
Siggraph 2004.
[7] http://www.robots.ox.ac.uk/~vgg/data/stickmen/
[8] http://www.vision.ee.ethz.ch/~calvin/ethz_pascal_stickmen/
[9] M.Eichner, M. Marin-Jimenez, A. Zisserman, V.Ferrari
Articulated Human Pose Estimation and Search in (Almost) Unconstrained Still Images
1.03
----
- initial public release
best system from [4] (see section 6, page 10 in the paper) with a few minor corrections (below).
As described in [4], this release includes a procedure for obtaining person-specific body part color models
before running pictorial structures inference. Therefore the initial parsing stage of [1,2,3], which used just edges, is skipped. This release also include the refinements to the standard pictorial structure model which we presented in [3] (e.g. the repulsive graph edges to reduce double-counting). For further details please refer to [1,2,3,4].
Finally, this release differs slightly from the best one in [4], because of the following corrections:
- additional adjustments of the kinematic prior after the rescaling correction [4]
- bug fix in message passing inherited from [1]
- code clean up
- 3 pretrained LPAT models [4]
1.02
----
- initial friends release
- this README
- structure of the parameters cleaned up
- system brought to the state presented in BMVC 2009
(including full color models via appearance transfer between parts, and better rescaling)
- fixed bug in message passing from [1]
- kinematic priors adjustments after rescaling corrections
1.01
----
- color models via body part location priors and appearance transfer mechanism
1.0
---
- initial internal release