9.6.4 Addressing Modes

The address generator can directly compute array references of the form $A[i\times P+Q][j\times R+S].field$ and vector accesses when both loop variables are nested loops, when one loop has been unrolled, and more importantly when the inner loop has been modulo-scheduled. For higher dimensional arrays, the base address is repeatedly recomputed using an ALU, and the last two dimensions are handled by the address generator.

Another important access pattern is indirect access of the form $A[B[i]]$. This is a common ingredient of neural network evaluation and can be used to implement bit-reversed addressing for FFT. It is also a generic access pattern - any complex access pattern can be precomputed and stored in $B[]$ and used at runtime to access the data in $A[\,]$. Vector indirect style accesses may be done by passing an ALU generated $B[i]$ address through the adder in Figure 9.7 thereby offsetting it with the base address of $A[\,]$. The ALU address can be computed, or it can be streamed into the ALU from SRAM by another address generator. Using two address generators and an ALU, complicated access patterns may be realized with high throughput. If the cost in terms of SRAM and function unit usage becomes too high, the address generator may be extended for other application specific access patterns. The stream address generator effectively converts the scratch-pad memory into a vector register file that can operate over complex access patterns and even interleave vectors for higher throughput. From an operational perspective, associating stream address generators with small scratch-pad memories unifies vector and VLIW architectures.

Binu Mathew