The Perception Processor

Binu Mathew
PDF Version
BibTex Entry


Recognizing speech, gestures, and visual features are important interface capabilities for future embedded mobile systems. Unfortunately the real-time performance requirements of complex perception applications cannot be met by current embedded processors and often even exceed the capability of high performance microprocessors. The energy budget of current high performance processors is infeasible in the embedded space. The normal approach is to resort to a custom ASIC to meet performance and energy constraints. However ASICs incur expensive and lengthy design cycles. They are so specialized that they are unable to support multiple applications or even evolutionary improvements in a single application. This dissertation introduces a VLIW perception processor that uses a combination of clustered function units, compiler controlled data-flow and compiler controlled clock-gating in conjunction with hardware support for modulo scheduling, address generation units and a scratch-pad memory system to achieve very high performance for perceptual algorithms at low energy consumption. The architecture is evaluated using benchmark algorithms taken from complex speech and visual feature recognition, security, and signal processing domains. Since energy and delay are common design trade-offs, the energy-delay product of a CMOS implementation of the perception processor is compared against ASICs and general purpose processors. Using a combination of Spice simulations, real processor power measurements and architecture simulation it is shown that the perception processor running at 1 GHz clock frequency outperforms a 2.4 GHz Pentium 4 by a factor of 1.75. While delivering this performance it simultaneously achieves 159 times better energy delay product than a low power Intel XScale embedded processor.

The perception processor makes sophisticated real-time perception applications possible within an energy budget that is commensurate with the embedded space, a task that is impossible with current embedded processors.

Binu Mathew