You do have a nice point here. Then the compute unit can simply stall the commands coming out of the register bank. Without this I need to stall the write FIFO, which feels less elegant and has given me some pain in terms of combinational loops. The drawback though is that you have to duplicate a significant amount of registers in the compute unit.
reply