7.2 Segmentation

Segmentation is the process of clumping together individual pixels into regions where an object might be found. A common approach is to do a connected component analysis, which typically forms irregular regions. Since the Viola and Rowley algorithms used for face detection need rectangular regions, instead of connected component analysis, a simple algorithm to cut apart the flesh tone bit mask into rectangles was used instead [103,83].

Two operators from mathematical morphology are applied to the bit mask: a $3\times3$ erosion operator followed by a $5\times5$ dilation operator. This has the effect of cutting away small connections and regions that are likely to be false positives and then smoothing the bit mask by filling in any small holes in the middle of an otherwise acceptable sized region. A logical OR of all the rows in the image is then performed to make a single row. This step is called vertical separation. Runs of ``1'' values in the single row represent vertical stripes of the image that contain objects of interest. Runs of ``0'' values represent vertical stripes that may be discarded. For each vertical stripe, the columns are logically OR-ed to create a single column. This is called horizontal separation. Runs of ``1'' represent the region of interest. This algorithm can be recursively applied to isolate the rectangular regions of interest. In the actual implementation, the horizontal separation steps for all the vertical stripes are done together in an interleaved manner. This has the effect of converting the column walk across the bitmap into a row walk giving better cache performance. Recursion is stopped after two levels since this has empirically provided adequate results. The flesh tone bitmap is discarded at this stage. The output of this stage is a list of coordinates of the top left and bottom right corners of rectangular regions of interest and a gray scale version of the image.

Binu Mathew