2.4 Custom Hardware for Perception

Pihl at the Norwegian University of Science and Technology designed the PDF coprocessor, a custom coprocessor in a 0.8$\mu$ CMOS process to accelerate the computation of Gaussian observation probabilities in a hidden Markov model based speech recognizer [77]. This research concluded that memory bandwidth was a limiting factor for Gaussian computation. Pihl approached the memory bandwidth problem by using a new fixed point representation called the dynamical circular fixed-point format, which reduced the memory bandwidth requirement by half. The PDF coprocessor could evaluate 40,000 39-element Gaussian components in real time using this format at 154 MHz consuming 853 mW of power. The work was based on an early version of Sphinx. In the current Sphinx 3.2 version, the workload has worsened by a factor of 15.3. This number, as well as the bandwidth requirement, is expected to increase further in the future.

An earlier attempt to accelerate speech recognition may be found in the work of Anatharaman and Bisiani [10]. They present a custom architecture as well as a multiprocessor architecture for improving the performance of the beam search algorithm used by the CMU distributed speech recognition system.

Benedetti and Perona describe an FPGA based system that exploits memory locality for real-time low level vision [13]. Their system targeted the fast prototyping of low level vision techniques using observations about locality in pixel neighborhoods to achieve 2.8 GBytes/second bandwidth between SRAM components and FPGA based compute elements.

Binu Mathew