SPI Engine Pipeline Delays

The SPI Engine implementation imposes certain constraints on the timing of different commands. Each instruction requires some number of cycles to execute, which may depend on the instruction parameters. Additionally, there are delays associated with the internal architecture of the SPI Engine, which become relevant unless we are using the Offload functionality.

Instruction Execution

Every instruction requires 1 cycle minimum for communication between the offload module and the execution module. Additionally, the Chip Select, Sleep, Transfer and Sync instructions require another cycle for checking the idle condition (total 2 fixed delay for these).

The exact values are, counting from the execution module:

Instruction	Cycles
Configuration Write	1 cycle.
Sync	2 cycles.
Chip-select	\(2+ 2t((div+1)2)\). Where \(t\) is the chip select delay parameter on the instruction, and \(div\) is the prescaler register value. The CS value change happens after the first \(2+t((div+1)*2)\) cycles.
Sleep	\(2 + t((div+1)2)\). Where \(t\) is the sleep delay parameter on the instruction, and \(div\) is the prescaler register value.
Transfer	2 cycles, plus the transfer time.

Counting from the execution module means that these values are useful for calculating the delays on the offload case (simply add up each instruction execution time). For other cases, the detailed delays of the architecture are needed.

Detailed Delays

This section lists the delays inside the SPI Engine architecture. To make use of this information, one needs some degree of familiarity with the hdl implementation (knowledge of the sub-modules and the way they communicate).

Offload Module

All the delays measured for this module are in terms of SPI Engine clock cycles.

Trigger input to command valid output: 1 + 1-2(from a 2FF CDC, 0 if not asynchronous) cycles.
Trigger in to sdo_data_valid: 1 + 1-2(from a 2FF CDC, 0 if not asynchronous) cycles.
Maximum command throughput: 1 command per cycle.
sdi_data_valid to offload_sdi_valid: 0 cycles.

Interconnect Module

All the delays measured for this module are in terms of SPI Engine clock cycles.

The interconnect will only accept one manager at a time, and will wait until a sync handshake back to the manager is completed to free the channel.

Command valid input to command valid output (s0/s1 to m): 1 cycle if idle, 0 if already “owned” by the source (s0 or s1).
Sync valid from m side to s0/s1 sync valid (back to originating manager): 0 cycle.
Sync ready to idle (delay after finishing transaction response): 1 cycle
Thus, 2 cycles per command minimum if changing managers, 3 if accounting for sync (this is the worst case).
1 cycle per command (can accept back to back) if from same manager.
Thus, \(2+N_{cmd}\) minimum cycles per \(N_{cmd}\) “burst” from same source.
s0/s1_sdo_valid to m_sdo_valid: 0 if already “owned” by the source (s0 or s1). Otherwise has to wait until s0/s1 owns the channel.
m_sdi_valid to s0/s1_sdi_valid: 0 if already “owned” by the sink (s0 or s1). Otherwise has to wait until s0/s1 owns the channel.

Execution Module

All the delays measured for this module are in terms of SPI Engine clock cycles. See above: Instruction Execution

Every instruction requires 1 cycle minimum for communication between the Offload Module and the Execution Module. Additionally, the Chip Select, Sleep, Transfer and Sync instructions require another cycle for checking the idle condition (total 2 fixed delay for these).
- Chip Select, Sleep and Transfer have additional cycle requirements due to intentional delays in execution. This is better detailed at Instruction Execution.
SDI data delay: 0 cycles (sdi_data_valid arrives at the same cycle as the Transfer instruction finishes and the next command is accepted).

AXI Module

AXI transaction to take effect internally: 1 (AXI clock).
- Meaning: if counting delay to other parts of the design (e.g. command fifo), this is the AXI delay. Other AXI delays affect only AXI throughput, creating backpressure for the AXI manager.
Throughput: 4 cycles (AXI clock) per transaction.
Command FIFO delay: depends on parametrization:
- Synchronous, 1 deep: 1 clk (AXI clock = SPI Engine clock).
- Asynchronous, 1 deep: 1 (AXI clock), + 1-2 (SPI Engine clock) (2FF CDC) input to output; + 1-2 (AXI clock) (2FF CDC) until ready to accept next.
- Asynchronous, true FIFO: 2 (AXI clock) (mem write + bin2gray addr), + 1-2 (SPI Engine clock) (2FF CDC), + 2 (SPI Engine clock) (gray2bin + valid) .
AXI transaction start to command valid (total for async FIFO case): 3 AXI clock + 3-4 SPI Engine clock.
SDO Data FIFO delay: same as Command FIFO.
SDI Data FIFO delay: depends on parametrization:
- Synchronous, 1 deep: 1 clk (AXI clock = SPI Engine clock).
- Asynchronous, 1 deep: 1 (SPI Engine clock), + 1-2 (AXI clock) (2FF CDC) input to output; + 1-2 (SPI Engine clock) (2FF CDC) until ready to accept next.
- Asynchronous, true FIFO: 2 (SPI Engine clock) (mem write + bin2gray addr), + 1-2 (AXI clock) (2FF CDC), + 2 (AXI clock) (gray2bin + valid) .