SiliconIntelligence

Bibliography

1
Cognex Inc.
http://www.cognex.com/, 2004.

2
Coreco Inc.
http://www.coreco.com/, 2004.

3
AARTS, B., BARRETEAU, M., BODIN, F., BRINKHAUS, P., CHAMSKI, Z., CHARLES, H.-P., EISENBEIS, C., GURD, J. R., HOGGERBRUGGE, J., HU, P., JALBY, W., KNIJNENBURG, P. M. W., O'BOYLE, M. F. P., ROHOU, E., SAKELLARIOU, R., SCHEPERS, H., SEZNEC, A., STOHR, E., VERHOEVEN, M., AND WIJSHOFF, H. A. G.
OCEANS: Optimizing compilers for embedded applications.
In European Conference on Parallel Processing (1997), pp. 1351-1356.

4
ABNOUS, A., SENO, K., ICHIKAWA, Y., WAN, M., AND RABAEY, J. M.
Evaluation of a low-power reconfigurable DSP architecture.
In IPPS/SPDP Workshops (1998), pp. 55-60.

5
ADVANCED MICRO DEVICES, I.
AMD Athlon Processor x86 Code Optimization Guide, k ed., Feb. 2002.

6
AGARAM, K., KECKLER, S. W., AND BURGER, D.
A characterization of speech recognition on modern computer systems.
In Proceedings of the 4th IEEE Workshop on Workload Characterization (Dec. 2001).

7
AKTURAN, C., AND JACOME, M. F.
FDRA: A software-pipelining algorithm for embedded VLIW processors.
In Proceedings of the 13th International Symposium on System Synthesis (2000), pp. 34-40.

8
AKTURAN, C., AND JACOME, M. F.
CALiBeR: A software pipelining algorithm for clustered embedded VLIW processors.
In Proceedings of the IEEE/ACM International Conference on Computer Aided Design (2001), pp. 112-118.

9
ALNUWEIRI, H. M., AND PRASANNA, V. K.
Parallel architectures and algorithms for image component labelling.
IEEE Transactions on Pattern Analysis and Machine Learning 14, 10 (Oct. 1992), 1014-1034.

10
ANANTHARAMAN, T., AND BISIANI, R.
A hardware accelerator for speech recognition algorithms.
In Proceeedings of the 13th International Symposium on Computer Architecture (June 1986).

11
ASANOVIC, K.
The Computer Engineering Handbook.
CRC Press, Dec. 2001, ch. Vector Processors.

12
ATHAS, W., YOUNGS, L., AND REINHART, A.
Compact models for estimating microprocessor frequency and power.
In Proceedings of the 2002 international symposium on Low power electronics and design (2002), ACM Press, pp. 313-318.

13
BENEDETTI, A., AND PERONA, P.
A novel system architecture for real-time low-level vision.
In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS) (1999), pp. 500-503.

14
BERTRAN, A., YU, H., AND SACCHETTO, P.
Face detection project report.
http://ise.stanford.edu/2002projects/ee368/Project/reports/ee368group17.pdf, 2002.

15
BOAHEN, K.
Retinomorphic chips that see quadruple images.
In Microelectronics for Neural, Fuzzy and Bio-Inspired Systems, 1999. MicroNeuro '99 (1999), pp. 12-20.

16
BONA, A., SAMI, M., SCIUTO, D., SILVANO, C., ZACCARIA, V., AND ZAFALON, R.
Energy estimation and optimization of embedded VLIW processors based on instruction clustering.

17
BROOKS, D., TIWARI, V., AND MARTONOSI, M.
Wattch: a framework for architectural-level power analysis and optimizations.
In ISCA (2000), pp. 83-94.

18
BUDIU, M., AND GOLDSTEIN, S. C.
Fast compilation for pipelined reconfigurable fabrics.
In ACM/SIGDA International Symposium on Field Programmable Gate Arrays (Monterey, CA, 1999), S. Kaptanoglu and S. Trimberger, Eds., ACM Press, pp. 195-205.

19
BURGER, D., AND AUSTIN, T. M.
The SimpleScalar tool set, version 2.0.
SIGARCH Computer Architecture News 25, 3 (1997), 13-25.

20
CALLAHAN, T., AND WAWRZYNEK, J.
Adapting software pipelining for reconfigurable computing.
In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES) (San Jose, CA, 2000), ACM.

21
CAMPBELL, M.
Evaluating ASIC, DSP, and RISC architectures for embedded applications.
In Proceedings of the ACM SIGPLAN Workshop on Languages, Compilers, and Tools for Embedded Systems (1998), Springer-Verlag, pp. 261-265.

22
CAO, Y., SATO, T., SYLVESTER, D., ORSHANSKY, M., AND HU, C.
New paradigm of predictive MOSFET and interconnect modeling for early circuit design.
In Proceedings of the IEEE Custom Integrated Circuits Conference (CICC) (June 2000), pp. 201-204.

23
CAO, Y., SATO, T., SYLVESTER, D., ORSHANSKY, M., AND HU, C.
Predictive technology model.
http://www-device.eecs.berkeley.edu/~ptm, 2002.

24
CAT, H. H., EBLE, J. C., WILLS, D. S., DE, V. K., BROOKE, M., , AND JOKERST, N. M.
Low power opportunities for a SIMD VLSI architecture incorporating integrated optoelectronic devices.
In Proceedings of GoMAC (Mar. 1996).

25
CONNELL, J.
Face finding.
http://www.research.ibm.com/ecvg/jhc_proj/faces.html, June 2002.

26
CONTE, T. M., DUBEY, P. K., JENNINGS, M. D., LEE, R. B., PELEG, A., RATHNAM, S., SCHLANSKER, M. S., SONG, P., AND WOLFE, A.
Challenges to combining general-purpose and multimedia processors.
IEEE Computer 30, 12 (1997), 33-37.

27
CORREALE, JR., A.
Overview of the power minimization techniques employed in the IBM PowerPC 4xx embedded controllers.
In Proceedings of the 1995 international symposium on Low power design (1995), ACM Press, pp. 75-80.

28
D. BOLME, R. BEVERIDGE, M. T., AND DRAPER, B.
The CSU face identification evaluation system: Its purpose, features and structure.
In International Conference on Vision Systems (April 2003), pp. 304-311.

29
DAEMEN, J., AND RIJMEN, V.
The block cipher Rijndael.
Smart Card Research and Applications, LNCS 1820 (2000), 288-296.

30
DAVID PALLETT, J. G. F., AND PRZYBOCKI, M. A.
1996 preliminary broadcast news benchmark tests.
In Proceedings of the 1997 DARPA Speech Recognition Workshop (Feb. 1997).

31
DEHON, A.
DPGA-coupled microprocessors: Commodity ICs for the early 21st century.
In IEEE Workshop on FPGAs for Custom Computing Machines (Los Alamitos, CA, 1994), D. A. Buell and K. L. Pocek, Eds., IEEE Computer Society Press, pp. 31-39.

32
DELANEY, B., JAYANT, N., HANS, M., SIMUNIC, T., AND ACQUAVIVA, A.
A low-power, fixed-point front-end feature extraction for a distributed speech recognition system.
In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2002) (2002).

33
ECKSTEIN, E., AND KRALL, A.
Minimizing cost of local variables access for DSP-processors.
In LCTES'99 Workshop on Languages, Compilers and Tools for Embedded Systems (Atlanta, 1999), Y. A. Liu and R. Wilhelm, Eds., vol. 34(7), pp. 20-27.

34
FANG, W.-C.
A system-on-chip design of a low-power smart vision system.
In Proceedings of the IEEE Workshop on Signal Processing Systems (1998), pp. 63-72.

35
FARABOSCHI, P., BROWN, G., FISHER, J. A., DESOLI, G., AND HOMEWOOD, F.
Lx: a technology platform for customizable VLIW embedded processing.
In The 27th Annual International Symposium on Computer architecture 2000 (New York, NY, USA, 2000), ACM Press, pp. 203-213.

36
FARBER, P., AND ASANOVIC, K.
Parallel neural network training on Multi-Spert.
In Proceedings of Third IEEE International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP (Dec. 1997).

37
FERRETTI, M.
Multimedia extensions in super-pipelined microarchitectures. a new case for SIMD processing?
In Fifth IEEE International Workshop on Computer Architectures for Machine Perception (2000), pp. 249-258.

38
FRIGO, M., AND JOHNSON, S. G.
FFTW: An adaptive software architecture for the FFT.
In Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing (Seattle, WA, May 1998), vol. 3, pp. 1381-1384.

39
GONZALEZ, R., AND HOROWITZ, M.
Energy dissipation in general purpose microprocessors.
IEEE Journal of Solid-State Circuits 31, 9 (September 1996), 1277-1284.

40
GONZALEZ, R. E.
Xtensa: A configurable and extensible processor.
IEEE Micro 20, 2 (March 2000), 60-70.

41
GOWAN, M. K., BIRO, L. L., AND JACKSON, D. B.
Power considerations in the design of the Alpha 21264 microprocessor.
In Design Automation Conference (1998), pp. 726-731.

42
GRADY, T.
Bit-reversed addressing in C on the C3x.
In TMS320 DSP Designer's Notebook, vol. SPRA204. Texas Instruments, 1992.

43
HAGER, G. D., AND TOYAMA, K.
X vision: A portable substrate for real-time vision applications.
Computer Vision and Image Understanding: CVIU 69, 1 (1998), 023-037.

44
HAMMERSTROM, D.
A VLSI architecture for high-performance, low-cost, on-chip learning.
In International Joint Conference on Neural Networks (1990), pp. 537-544.

45
HARRISON, R. R.
An Analog VLSI Motion Sensor Based on the Fly Visual System.
PhD thesis, California Institute of Technology, May 2000.

46
HENNESSY, J., AND PATTERSON, D.
Computer Architecture: A Quantitative Approach, 3rd ed.
Morgan Kaufmann, 2002.

47
HOOGERBRUGGE, J., AND AUGUSTEIJN, L.
Instruction scheduling for TriMedia.
Journal of Instruction-Level Parallelism, 1(1) (Feb. 1999).

48
HOOGERBRUGGE, J., CORPORAAL, H., AND MULDER, H.
MOVE: a framework for high-performance processor design.
In Proceedings of the 1991 ACM/IEEE conference on Supercomputing (1991), ACM Press, pp. 692-701.

49
HUANG, X., ALLEVA, F., HON, H.-W., HWANG, M.-Y., LEE, K.-F., AND ROSENFELD, R.
The SPHINX-II speech recognition system: an overview.
Computer Speech and Language 7, 2 (1993), 137-148.

50
INTEL CORPORATION.
Using streaming SIMD extensions 2 (SSE2) to evaluate hidden Markov model with Viterbi decoding.
Tech. Rep. AP-946, Intel Corporation, 2000.

51
INTEL CORPORATION.
Intel Pentium 4 Processor Optimization Reference Manual, 2002.

52
INTEL CORPORATION.
Open source computer vision library.
http://www.intel.com/research/mrl/research/opencv/, 2002.

53
JOHNSON, M. C., SOMASEKHAR, D., AND ROY, K.
Leakage control with efficient use of transistor stacks in single threshold CMOS.
In Proceedings of the 36th ACM/IEEE conference on Design automation conference (1999), ACM Press, pp. 442-445.

54
JONES, S. P.
Haskell 98 Language and Libraries.
Cambridge University Press, Cambridge, UK, 2003.

55
JOSHI, S. M.
Some fast speech processing algorithms using Altivec technology.
In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (Mar. 1999), pp. 2135 - 2138.

56
KARL, W.
Some design aspects for VLIW architectures exploiting fine - grained parallelism.
In Parallel Architectures and Languages Europe (1993), pp. 582-599.

57
KLEIHORST, R., ABBO, A., VAN DER AVOIRD, A., OP DE BEECK, M., SEVAT, L., WIELAGE, P., VAN VEEN, R., AND VAN HERTEN, H.
Xetal: A low-power high-performance smart camera processor.
In The IEEE International Symposium on Circuits and Systems, (ISCAS) (2001), pp. 215-218.

58
KRASHINSKY, R.
Microprocessor energy characterization and optimization through fast, accurate, and flexible simulation.
Master's thesis, Massachusetts Institute of Technology, May 2001.

59
LAI, C., LU, S.-L., AND ZHAO, Q.
Performance analysis of speech recognition software.
In Proceedings of the Fifth Workshop on Computer Architecture Evaluation using Commercial Workloads (Feb. 2002).

60
LAPINSKII, V., JACOME, M., AND DE VECIANA, G.
Application-specific clustered VLIW datapaths: early exploration on a parameterized design space.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 21, 8 (Aug. 2002), 889-903.

61
LEE, C., LEE, J. K., HWANG, T., AND TSAI, S.-C.
Compiler optimization on instruction scheduling for low power.
In Proceedings of the 13th International Symposium on System Synthesis (ISSS'00) (2000), IEEE Computer Society, p. 55.

62
LEE, W., BARUA, R., FRANK, M., SRIKRISHNA, D., BABB, J., SARKAR, V., AND AMARASINGHE, S.
Space-time scheduling of instruction-level parallelism on a raw machine.
In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (1998), ACM Press, pp. 46-57.

63
LEUPERS, R.
Instruction scheduling for clustered VLIW DSPs.
In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT) (Oct. 2000), pp. 291-300.

64
MARTIN, A. J., NYSTROEM, M., AND PENZES, P.
ET2: A metric for time and energy efficiency of computation.
Tech. Rep. CaltechCSTR:2001.007, Caltech Computer Science, 2001.

65
MATHEW, B., DAVIS, A., AND EVANS, R.
A characterization of visual feature recognition.
In Proceedings of the IEEE 6th Annual Workshop on Workload Characterization (WWC-6) (October 2003), pp. 3-11.

66
MATHEW, B., DAVIS, A., AND FANG, Z.
A low-power accelerator for the Sphinx 3 speech recognition system.
In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES '03) (October 2003), pp. 210-219.

67
MATHEW, B., DAVIS, A., AND IBRAHIM, A.
Perception coprocessors for embedded systems.
In Proceedings of the Workshop on Embedded Systems for Real-Time Multimedia (ESTIMedia) (October 2003), pp. 109-116.

68
MCVOY, L. W., AND STAELIN, C.
lmbench: Portable tools for performance analysis.
In USENIX Annual Technical Conference (1996), pp. 279-294.

69
MEMIK, S., BOZORGZADEH, E., KASTNER, R., AND SARRAFZADEH, M.
SPS: A strategically programmable system.
In Proceedings of the Reconfigurable Architectures Workshop (RAW) (Apr. 2001).

70
MEMIK, S. O., BOZORGZADEH, E., KASTNER, R., AND SARRAFZADE, M.
A super-scheduler for embedded reconfigurable systems.
In Proceedings of the International Conference on Computer-Aided Design (ICCAD) (Nov. 2001), p. 391.

71
MIPS TECHNOLOGIES, INC.
MIPS R4000 Microprocessor User's Manual, Second Edition, April 1993.

72
MODULE RESEARCH CENTER.
NeuroMatrix NM6403 digital signal processor.
Tech. Rep. 431282.001D2, Module Research Center, 2000.

73
MORETTO, P.
Mapping of speech front-end signal processing to high performance vector architectures.
Tech. Rep. TR-95-063, International Computer Science Institute, University of California at Berkeley, 1995.

74
MOSUR, R.
Efficient Algorithms for Speech Recognition.
PhD thesis, Carnegie Mellon University, May 1996.
CMU-CS-96-143.

75
PENTLAND, A.
Looking at people: Sensing for ubiquitous and wearable computing.
IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 22, 1 (Jan. 2000), 107-118.

76
PERING, T., AND BRODERSON, R.
Dynamic voltage scaling and the design of a low-power microprocessor system.
In Proceedings of the International Symposium on Computer Architecture ISCA'98 (June 1998).

77
PIHL, J., SVENDSEN, T., AND JOHNSEN, M. H.
A VLSI implementation of PDF computations in HMM based speech recognition.
In Proceedings of the IEEE Region Ten Conference on Digital Signal Processing Applications (TENCON'96) (Nov. 1996).

78
POWELL, M., YANG, S.-H., FALSAFI, B., ROY, K., AND VIJAYKUMAR, T. N.
Gated-Vdd: a circuit technique to reduce leakage in deep-submicron cache memories.
In Proceedings of the 2000 International Symposium on Low Power Electronics and Design (2000), ACM Press, pp. 90-95.

79
RABINER, L., AND JUANG, B.-H.
Fundamentals of Speech Recognition.
Prentice Hall, 1993, ch. 9, p. 494.

80
RABINER, L. R.
A tutorial on hidden Markov models and selected applications in speech recognition.
Proceedings of the IEEE 77, 2 (Dec. 1989), 257-286.

81
RAU, B. R.
Iterative modulo scheduling: an algorithm for software pipelining loops.
In Proceedings of the 27th Annual International Symposium on Microarchitecture (1994), ACM Press, pp. 63-74.

82
RIXNER, S., DALLY, W. J., KAPASI, U. J., KHAILANY, B., LOPEZ-LAGUNAS, A., MATTSON, P. R., AND OWENS, J. D.
A bandwidth-efficient architecture for media processing.
In Proceedings of the 31st Annual ACM/IEEE International Symposium on Microarchitecture (MICRO-31) (Nov. 1998), pp. 3-13.

83
ROWLEY, H. A., BALUJA, S., AND KANADE, T.
Neural network-based face detection.
IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 1 (1998), 23-38.

84
RUSSELL, J., AND JACOME, M.
Software power estimation and optimization for high performance, 32-bit embedded processors.

85
RUSSELL, R. M.
The CRAY-1 computer system.
Communications of the ACM 21, 1 (1978), 63-72.

86
SCHAPIRE, R. E.
The boosting approach to machine learning: An overview.
In In MSRI Workshop on Nonlinear Estimation and Classification (2002).

87
SCHMIT, H., WHELIHAN, D., TSAI, A., MOE, M., LEVINE, B., AND TAYLOR, R.
Piperench: a virtualized programmable datapath in 0.18 micron technology.
In Proceedings of the IEEE Custom Integrated Circuits Conference (2002), pp. 63-66.

88
SMITH, J. E.
Decoupled access/execute computer architectures.
In Proceedings of the 9th annual symposium on Computer Architecture (1982), IEEE Computer Society Press, pp. 112-119.

89
SMITH, M. D., LAM, M., AND HOROWITZ, M. A.
Boosting beyond static scheduling in a superscalar processor.
In Proceedings of the 17th Annual Symposium on Computer Architecture (1990), pp. 344-354.

90
SORIANO, M., MARTINKAUPPI, B., HUOVINEN, S., AND LAAKSONEN, M.
Using the skin locus to cope with changing illumination conditions in color-based face tracking.
In Proceedings of the IEEE Nordic Signal Processing Symposium (2000), pp. 383-386.

91
SRIVASTAVA, S.
Fast gaussian evaluations in large vocabulary continuous speech recognition.
M.S. Thesis, Department of Electrical and Computer Engineering, Mississippi State University, Oct. 2002.

92
STERN, R. M.
Specification of the 1996 HUB 4 broadcast news evaluation.
http://www.nist.gov/speech/publications/darpa97/pdf/stern1.pdf, 1996.

93
SUNDARARAJAN, V., AND PARHI, K. K.
Low power synthesis of dual threshold voltage CMOS VLSI circuits.
In Proceedings of the 1999 international symposium on Low power electronics and design (1999), ACM Press, pp. 139-144.

94
TEXAS INSTRUMENTS.
TMS320C6000 CPU and Instruction Set Reference Guide, spru189f ed., Oct. 2000.

95
TIWARI, V., MALIK, S., WOLFE, A., AND LEE, M.
Instruction level power analysis and optimization of software.
In Proceedings of the Ninth International Conference on VLSI Design (Jan. 1996), pp. 326-328.

96
TIWARI, V., SINGH, D., RAJGOPAL, S., MEHTA, G., PATEL, R., AND BAEZ, F.
Reducing power in high-performance microprocessors.
In Proceedings of the 35th Annual Design Automation Conference (1998), ACM Press, pp. 732-737.

97
TONG, Y. F., RUTENBAR, R., AND NAGLE, D.
Minimizing floating-point power dissipation via bit-width reduction.
In Proceedings of the 1998 International Symposium on Computer Architecture Power Driven Microarchitecture Workshop (1998).

98
TSENG, J. H., AND ASANOVIC, K.
Energy-efficient register access.
In Proceedings of the 13th Symposium on Integrated Circuits and Systems Design (SBCCI'00) (2000), IEEE Computer Society, p. 377.

99
TURK, M., AND PENTLAND, A.
Face recognition using Eigenfaces.
In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR) (June 1991), pp. 586-591.

100
UNGER, S., AND MUELLER, F.
Handling irreducible loops: Optimized node splitting vs. DJ-graphs.
Lecture Notes in Computer Science 2150 (2001), 207+.

101
VAN ROSSUM, G.
Python Reference Manual, 2.3.3 ed., Dec. 2003.

102
VERMA, A., FARUQUIE, T., NETI, C., BASU, S., AND SENIOR, A.
Late integration in audio-visual continuous speech recognition.
In Proceedings of the Automatic Speech Recognition and Understanding Workshop (ASRU) (1999).

103
VIOLA, P., AND JONES, M.
Rapid object detection using a boosted cascade of simple features.
In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Dec. 2001).

104
WAINGOLD, E., TAYLOR, M., SRIKRISHNA, D., SARKAR, V., LEE, W., LEE, V., KIM, J., FRANK, M., FINCH, P., BARUA, R., BABB, J., AMARASINGHE, S., AND AGARWAL, A.
Baring it all to software: Raw machines.
IEEE Computer 30, 9 (1997), 86-93.

105
WANG, C.-L., BHAT, P. B., AND PRASANNA, V. K.
High performance computing for vision.
Proceedings of the IEEE 84, 7 (July 1996), 931-946.

106
WAWRZYNEK, J., ASANOVIC, K., KINGSBURY, B., BECK, J., JOHNSON, D., AND MORGAN, N.
SPERT-II: A vector microprocessor system and its application to large problems in backpropagation training.
In Advances in Neural Information Processing Systems (1996), D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, Eds., vol. 8, The MIT Press, pp. 619-625.

107
WEEMS, C. C.
The second generation image understanding architecture and beyond.
In Proceedings of Computer Architectures for Machine Perception (Nov. 1993), pp. 276-285.

108
WEISS, M., AND FETTWEIS, G.
Dynamic codewidth reduction for VLIW instruction set architectures in digital signal processors, 1996.

109
WESTE, N. H. E., AND ESHRAGHIAN, K.
Principles of CMOS VLSI Design, A Systems Perspective, second ed.
Addison Wesley, 1993.

110
YANG, M.-H., KRIEGMAN, D., AND AHUJA, N.
Detecting faces in images: A survey.
IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 24, 1 (2002), 34-58.

111
YOUNG, S.
Large vocabulary continuous speech recognition: A review.
In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (Dec. 1995), pp. 3-28.

112
YUN, H.-S., AND KIM, J.
Power-aware modulo scheduling for high-performance vliw processors.
In Proceedings of the 2001 International Symposium on Low Power Electronics and Design (2001), ACM Press, pp. 40-45.



Binu Mathew