Newer
Older
Digital logic: Ik noem logic zonder klok vaak "combinatorial logic". Deze term is op zich correct, maar de meer gangbare term is "combinational logic", zie
https://en.wikipedia.org/wiki/Combinational_logic . Ik zal dat aanpassen in de documentatie. De logic met klok wordt "sequential logic" genoemd. Tesamen heet het "digital logic".
Het verschil is dat sequential logic memory heeft (flip flop registers) en combinational logic niet. De combinational logic bepaalt de functie van de logic en de sequential logic de toestand. Daarnaast kan sequential logic ook nog dienen voor pipelining. Pipelining introduceert latency, en is (vaak) nodig om de kloktiming te halen, dwz om te zorgen dat de combinational logic output steeds stabiel is binnen een klokcycle. Hieronder heb ik geschetst hoe je functie, toestand en pipelining netjes gescheiden kunt houden bij het implementeren van register transfer level (RTL) code.
Idea / rule: Distinguish beteen state registers and pipeline registers.
. The state registers keep the state of the function and the function itself is programmed in combinatorial logic.

Eric Kooistra
committed
In this way the pipelining that is needed to achieve timing closure can be added independent of the function.
This approach could be described in a paper, because it is quite significant and differs from the well known
Gailser approach (that uses RL=1 and does not separate state from pipeline). AXI uses RL=0 but need to check
how it then handles pipelining.
. Components need pipelining to achieve timing closure. This pipelining causes a latency in the data
stream. This latency is typically no problem, because it only delays the output. If components need
flow control then the stream has a siso backpressure signal that must have a certain timing relation
to the sosi data signal. This timing relation is the ready latency (RL) and the RL can be >= 0. For
RL = 0 the ready signal acts as a data acknowledge and for RL > 0 the ready signal acts as a data
request signal. Adding pipelining to the sosi data increases the RL.
. The RL is explained in the Avalon specification. An example of RL = 0 are so called look ahead (Altera)
or first word fall through (Xilinx) FIFOs. In our UniBoard applications we use RL = 1. For most parts
of the design we try to not use flow control. I think that the Axi stream use RL = 0.
. The function operates with ready latency (RL) = 0, if it is combinatorial. If the stream has no flow
control then the pipeline is achieved as an output register stage. If the stream does need flow control,
then this output register stage increases the RL by 1. To restore the RL to 0 a dp_latency_adapter.vhd
is needed. This latency adapter also registers the ready, so it provides pipelining for both the output
stream sosi data as well as the output stream siso ready flow control.
. For new components the development approach implement the function for RL=0, so only with the state
registers. If the component does not use flow control, then it may still just wire the flow control
from output to input. If the component does use flow control than it can combinatorially impose this
on the incomming flow control and pass the combined flow control on to its input. For timing closure
the pipelining is added as a seperate stage. Either pipeline sosi if no flow control is needed
or pipeline siso if flow control is needed. For example: dp_block_resize.vhd, dp_block_select.vhd,
dp_counter.vhd.
. Components that do not need input flow control can support external flow control by simply wiring the
output_siso to the input_siso.
. Components that do need input flow control can OR their input flow control with the external flow control
and wire that to the input_siso.

Eric Kooistra
committed

Eric Kooistra
committed

Eric Kooistra
committed
Ref:
$RADIOHDL/tools/oneclick/doc/desp_firmware_dag_erko.txt
HDL coding: Useful documents about with fundamental knowledge for digital logic in FPGAs and ASICs:
- Memory mapped RAM and registers and clock domain crossing. Thanks to these standard components we
can run the mm_clk at another rate than the dp_clk:
https://svn.astron.nl/UniBoard_FP7/RadioHDL/trunk/libraries/base/common/doc/ASTRON_RP_415_common_mem.pdf
https://svn.astron.nl/UniBoard_FP7/UniBoard/trunk/Firmware/modules/Lofar/async_logic/doc/async_logic.pdf
About meta stability and asynchronous logic. This doc cointains solutions for:
. synchronizing a reset to a clock domain,
. transfering a level signal between clock domains
. transfering a pulse signal between clock domains
It als contains a study that I did to understand how the control of a dual clock FIFO works. Typically
we use an IP component as FIFO, but I think the async_fifo RTL code would also work on HW, it does
work in simulation.
- RTL combinatorial D --> D, rising_edge D --> Q
. complicates functional thinking because mixes combinatorial (valid now) and clocked (valid one
cylce later)
. latency of D --> Q complicates backpressure (RL = 1)
. introduces non-functional pipeline, mixes state reg and pipeline reg
Gaisler structures this RTL D --> Q coding style, but does not solve these complications
Gaisler uses variables, but does not structure the use of variables
Ik zie twee niveaus:
1) Als de FPGA synthese & timing aangeeft dat het goed is, dan is de FPGA logic foutloos
(dwz what you code is what you get).
2) Als we goed ontwerpen, implementeren en testen, zorgen dat FIFOs niet overstromen, en zorgen
dat de packets die binnenkomen van buiten de FPGA correct zijn (bijv. mbv CRC ok) dan is de
block processing foutloos (dwz what you want is what you get). Dan kunnen we er intern steeds
van uitgaan dat de blokken data correct zijn en hoeven we dus intern geen checks meer te doen.
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
Design steps:
Make detailed design document with:
- requirements and assumptions
- design decisions
- context : environment, use cases
- architecture :
. dut block diagram with instances and anticipated processes,
. entity with generics, ports, MM interfaces.
- verification : sim on HW
. list of generic ranges to cover
. list of input case to cover
. tb block diagram with dut instance, other anticipated instances and processes
Implementation steps:
- prepare empty entity and tb files (dut, mmp_, tb_, tb_tb_)
- DUT:
. instantiate the reused components from the block diagram and wire them
. t_reg : gradually add the functional state signals that are needed
. -- State registers
SIGNAL r : t_reg;
SIGNAL nxt_r : t_reg;
. -- Pipeline registers
SIGNAL in_data_p : ...
. p_reg : PROCESS(dp_clk, dp_rst)
BEGIN
IF dp_rst='1' THEN
r <= c_reg_rst;
ELSIF rising_edge(dp_clk) THEN
r <= nxt_r;
END IF;
END PROCESS;
. p_comb : PROCESS(r, ...) -- single process for all combinatorial logic
-- complete sensitivity list contains all signals that are read, so that are
-- . at right of assignment statement <= , :=
-- . in the condition of an IF statement
-- State variable
VARIABLE v : t_reg;
-- Auxiliary variables
VARIABLE v_* -- optional, use to improve code readability
-- use v. only on left side? use separate v_* to clearly indicate when we use it also on the right side of assignments ?
BEGIN
v := r; -- default keep existing state
v.* := ...; -- default force specific values, e.g. set strobes to '0',
-- typically do not force data, but leave it as it is to avoid
-- unnecessary toggling and to ease viewing data value in Wave window
-- implement the processes from the block diagram to determine nxt_r
-- . this is where creativity is needed, however a good design will
-- guide the implementation
-- . use of v and r
-- next state
nxt_r <= v;
END PROCESS;
. -- Pipelining
- TB: Test the functionality of the DUT:
. start with easiest, default use case,
. then verify the additional functions and features,
. then verify the corner cases (e.g. 0, 1, some prime value, smallest, largest),
. check that the TB did indeed run (i.e. no happened must not be regarded as passed),
- TB_TB_: Multi test bench that runs multiple TB in parallel to achieve coverage
. multiple TB instances in parallel, each with other generic settings
. typically it is easier to run multiple TB in parallel, then to run the tests
that they do sequentially in one TB
- MMP_DUT: DUT with MM interfaces
- TB_MMP_DUT: Testbench with focus on MM interface access only, so typically no
need for a TB_TB_MMP.