10.5 Summary

The architectural features of the perception processor enable it to provide 1.74 times the throughput of a Pentium 4 while consuming 13.5 times less energy than an XScale embedded processor. Its architectural efficiency allows it to reach 41.4% of the throughput of the ASIC at five times the energy consumption of the ASIC - a small price for its generality and programmability. Since the processor circuits were evaluated at the netlist level and not laid out, rigorous area estimates were not made. Approximate estimates show that the die area is dominated by the amount of SRAM used in the design and the function units and interconnect occupy only a small fraction of the overall area. For typical high performance embedded systems, having adequate compute ability at a low energy budget is the critical factor, not area. The microprograms for the benchmarks discussed in this chapter took approximately 10 to 20 man-hours each to develop. The effort required can be drastically reduced if a high level language compiler is developed. In contrast, ASIC implementations of benchmarks like FFT and Fleshtone might take several man-months of effort. Altogether these radical improvements suggest that in cases where high performance, low design time and low energy consumption need to be addressed simultaneously, the perception processor could be an attractive alternative.

Binu Mathew