SiliconIntelligence

9.2 Instruction Format

Figure 9.3: Microinstruction Format
\includegraphics[width=0.8\columnwidth,keepaspectratio]{figures/cluster/opcode_format}

To understand the following discussion on the internals of the perception processor a quick introduction to the microinstruction format and the instruction fetch mechanism is necessary. Figure 9.3 shows the constitution of a typical instruction word. While the instruction word width and format are fixed for a given configuration, they will vary between configurations depending on the type and number of function units and interconnect paths. The type field specifies whether the instruction is a normal VLIW style instruction bundle or a reconfiguration command. Reconfiguration commands are used to dynamically modify the working of the address generators and the loop unit. The type field is followed by instruction packets for each function unit. If the type field specifies a reconfiguration command, the instruction packet fields have alternate interpretations. In that case, the decoder makes NOP packets for all the function units. Each instruction packet consists of an opcode, mux selects for the A and B operands selection muxes of a function unit and enable signals for the A and B input registers. The registers latch new values only when their enable signals are asserted. These FU opcode packets are followed by address generator operations each of which specify a load, store or NOP and the address context register to be used for the load or store operation. These are in turn followed by mux select signals for the interconnect muxes. Finally, there are a set of constant fields to support constants used in the code. The constant fields have different interpretations (e.g., one 16-bit constant, two 8-bit constants, four 4-bit constants, etc.) depending on the context. The decoder can perform modifications like sign or zero extension before the constants are presented to the function units. The instruction memory has 1 cycle latency. The decoder adds another cycle of latency. This 2 cycle fetch delay is accounted for in branch instructions and the loop unit logic. Since the actual bit positions of various fields depends on the configuration, the instruction fetch logic and the decoder are automatically generated by a netlist generator tool based on the processor configuration and bundling constraints.



Binu Mathew