SiliconIntelligence

7.6 Architectural Implications

To report the algorithmic complexity of the various phases of the visual feature recognizer, an $n\times n$ pixel square image is assumed in this section. The flesh tone detector applies algebraic transformations and inequalities on each pixel. Thus its complexity is $O(n^{2})$. The face detectors sweep a basic detector across all pixel locations in the image and then rescale the image, so their complexity grows as $O(n^{2}log(n))$. This will be compounded by the complexity of the base detector itself. For the Rowley method using $N$ neurons of length $L$, the base detector complexity is $O(LN)$ giving an overall complexity of $O(n^{2}log(n)LN)$. The complexity of Viola/Jones style base detector with $N$ rectangle features7.1is $O(N)$, which yields an overall complexity of $O(n^{2}log(n)N)$. For each region where a face is likely to be present, EigenFaces performs $O(Mn^{2}+KM)$ operations where $M$ is the number of Eigen vectors and $K$ is the number of known faces. This complexity is due to the $M$ dot products on vectors of length $n^{2}$ done while projecting the test image and the $K$ dot products on $M$ element vectors done while finding the most similar known face.

In all cases, the workload scales faster than $n^{2}$ as the image size is increased, so high performance architectures are necessary for larger images. For increased accuracy both the Rowley and Viola detectors need a larger number of neurons and features respectively leading to a linear increase in compute requirements. For EigenFaces, increasing the discrimination by using a larger number of Eigen vectors leads to a linear increase in the compute requirements as does increasing the number of known faces to check against.

Each of the phases is a natural fit for a streaming architecture. Since the flesh toner works on one pixel at a time, the image may be streamed pixel or raster line at a time through the processor. The face detectors work on rectangular regions of an image, thus the ability to hold a $30\times30$ image window on chip and stream the neurons or features through the processor is important. For a modest increase in on chip storage to about 16 KB both the neuron and feature descriptions can be held within the processor and image windows may be streamed through the processor. Since both detectors sweep their image windows row by row and column by column, the ability to hold 30 raster lines on chip will greatly reduce the number of image window fetch operations. Since they both work on gray scale images, the additional SRAM required is merely 9.3 KB for $320\times200$ sized images. While they have very different conceptual backgrounds, the base detectors of the Viola and Rowley algorithms are remarkably similar and both involve indirect vector access and dot product operations. The Viola algorithm uses a short vector length of nine and uses integer multiply accumulate operations while the Rowley method uses longer vectors with lengths ranging from 11 to 151 with the sizes 101, 151 and 26 covering 87.8% of all evaluated neurons. Currently, neural net evaluation involves floating point multiply accumulate operations. Given the limited range of weights and histogram equalized image pixels, this could possibly be converted to scaled integer arithmetic.

Similarly, EigenFaces is dominated by floating point dot products which in turn depend on floating point multiply accumulate operations. Each test image needs to be projected to face space based on the stored Eigen vectors. This is a series of dot product operations, and each stored Eigen vector may be simply streamed through the processor while holding the flattened image within the processor. The vector length is equivalent to the number of Eigen vectors. Values of 50 or more are required in practice. Identification can be done by holding the projected test image constant in the processor and streaming the known projected images for computing dot products. Thus it can be seen that on the whole the nature of visual feature algorithms lends themselves to efficient stream processor implementations.



Binu Mathew