erko_hdl_design_article.txt

Digital logic: Ik noem logic zonder klok vaak "combinatorial logic". Deze term is op zich correct, maar de meer gangbare term is "combinational logic", zie
https://en.wikipedia.org/wiki/Combinational_logic . Ik zal dat aanpassen in de documentatie. De logic met klok wordt "sequential logic" genoemd. Tesamen heet het "digital logic".

Het verschil is dat sequential logic memory heeft (flip flop registers) en combinational logic niet. De combinational logic bepaalt de functie van de logic en de sequential logic de toestand. Daarnaast kan sequential logic ook nog dienen voor pipelining. Pipelining introduceert latency, en is (vaak) nodig om de kloktiming te halen, dwz om te zorgen dat de combinational logic output steeds stabiel is binnen een klokcycle. Hieronder heb ik geschetst hoe je functie, toestand en pipelining netjes gescheiden kunt houden bij het implementeren van register transfer level (RTL) code. 

Idea / rule: Distinguish beteen state registers and pipeline registers.

. The state registers keep the state of the function and the function itself is programmed in combinatorial logic.
  In this way the pipelining that is needed to achieve timing closure can be added independent of the function.
  This approach could be described in a paper, because it is quite significant and differs from the well known
  Gailser approach (that uses RL=1 and does not separate state from pipeline). AXI uses RL=0 but need to check 
  how it then handles pipelining.
. Components need pipelining to achieve timing closure. This pipelining causes a latency in the data
  stream. This latency is typically no problem, because it only delays the output. If components need
  flow control then the stream has a siso backpressure signal that must have a certain timing relation
  to the sosi data signal. This timing relation is the ready latency (RL) and the RL can be >= 0. For 
  RL = 0 the ready signal acts as a data acknowledge and for RL > 0 the ready signal acts as a data
  request signal. Adding pipelining to the sosi data increases the RL.
. The RL is explained in the Avalon specification. An example of RL = 0 are so called look ahead (Altera)
  or first word fall through (Xilinx) FIFOs. In our UniBoard applications we use RL = 1. For most parts
  of the design we try to not use flow control. I think that the Axi stream use RL = 0.
. The function operates with ready latency (RL) = 0, if it is combinatorial. If the stream has no flow
  control then the pipeline is achieved as an output register stage. If the stream does need flow control,
  then this output register stage increases the RL by 1. To restore the RL to 0 a dp_latency_adapter.vhd
  is needed. This latency adapter also registers the ready, so it provides pipelining for both the output
  stream sosi data  as well as the output stream siso ready flow control.
. For new components the development approach implement the function for RL=0, so only with the state
  registers. If the component does not use flow control, then it may still just wire the flow control
  from output to input. If the component does use flow control than it can combinatorially impose this
  on the incomming flow control and pass the combined flow control on to its input. For timing closure
  the pipelining is added as a seperate stage. Either pipeline sosi if no flow control is needed
  or pipeline siso if flow control is needed. For example: dp_block_resize.vhd, dp_block_select.vhd,
  dp_counter.vhd.
. Components that do not need input flow control can support external flow control by simply wiring the 
  output_siso to the input_siso.
. Components that do need input flow control can OR their input flow control with the external flow control
  and wire that to the input_siso.

  
$RADIOHDL_WORK/applications/lofar2/doc/prestudy/

Ref:
 $RADIOHDL/tools/oneclick/doc/desp_firmware_dag_erko.txt
 $RADIOHDL/tools/oneclick/doc/desp_firmware_overview.txt


HDL coding: Useful documents about with fundamental knowledge for digital logic in FPGAs and ASICs:
  - Memory mapped RAM and registers and clock domain crossing. Thanks to these standard components we
    can run the mm_clk at another rate than the dp_clk:
    https://svn.astron.nl/UniBoard_FP7/RadioHDL/trunk/libraries/base/common/doc/ASTRON_RP_415_common_mem.pdf
    https://svn.astron.nl/UniBoard_FP7/UniBoard/trunk/Firmware/modules/Lofar/async_logic/doc/async_logic.pdf
    About meta stability and asynchronous logic. This doc cointains solutions for:
    . synchronizing a reset to a clock domain,
    . transfering a level signal between clock domains
    . transfering a pulse signal between clock domains
      It als contains a study that I did to understand how the control of a dual clock FIFO works. Typically
      we use an IP component as FIFO, but I think the async_fifo RTL code would also work on HW, it does
      work in simulation.
  - RTL combinatorial D --> D, rising_edge D --> Q
    . complicates functional thinking because mixes combinatorial (valid now) and clocked (valid one
      cylce later)
    . latency of D --> Q complicates backpressure (RL = 1)
    . introduces non-functional pipeline, mixes state reg and pipeline reg
    Gaisler structures this RTL D --> Q coding style, but does not solve these complications
    Gaisler uses variables, but does not structure the use of variables

Ik zie twee niveaus:
1) Als de FPGA synthese & timing aangeeft dat het goed is, dan is de FPGA logic foutloos
  (dwz what you code is what you get).
2) Als we goed ontwerpen, implementeren en testen, zorgen dat FIFOs niet overstromen, en zorgen
   dat de packets die binnenkomen van buiten de FPGA correct zijn (bijv. mbv CRC ok) dan is de
   block processing foutloos (dwz what you want is what you get). Dan kunnen we er intern steeds
   van uitgaan dat de blokken data correct zijn en hoeven we dus intern geen checks meer te doen.


Design steps:
Make detailed design document with:
- requirements and assumptions
- design decisions
- context : environment, use cases
- architecture :
  . dut block diagram with instances and anticipated processes,
  . entity with generics, ports, MM interfaces.
- verification : sim on HW
  . list of generic ranges to cover
  . list of input case to cover
  . tb block diagram with dut instance, other anticipated instances and processes

Implementation steps:
- prepare empty entity and tb files (dut, mmp_, tb_, tb_tb_)
- DUT:
  . instantiate the reused components from the block diagram and wire them
  .  t_reg : gradually add the functional state signals that are needed

  . -- State registers
    SIGNAL r             : t_reg;
    SIGNAL nxt_r         : t_reg;

  . -- Pipeline registers
    SIGNAL in_data_p     : ...

  . p_reg : PROCESS(dp_clk, dp_rst)
    BEGIN
      IF dp_rst='1' THEN
        r <= c_reg_rst;
      ELSIF rising_edge(dp_clk) THEN
        r <= nxt_r;
      END IF;
    END PROCESS;

  . p_comb : PROCESS(r, ...)  -- single process for all combinatorial logic
                              -- complete sensitivity list contains all signals that are read, so that are
                              -- . at right of assignment statement <= , :=
                              -- . in the condition of an IF statement
      -- State variable
      VARIABLE v : t_reg;
      -- Auxiliary variables
      VARIABLE v_*   -- optional, use to improve code readability
                     -- use v. only on left side? use separate v_* to clearly indicate when we use it also on the right side of assignments ?
    BEGIN
      v := r;      -- default keep existing state
      v.* := ...;  -- default force specific values, e.g. set strobes to '0',
                   -- typically do not force data, but leave it as it is to avoid
                   -- unnecessary toggling and to ease viewing data value in Wave window

      -- implement the processes from the block diagram to determine nxt_r
      -- . this is where creativity is needed, however a good design will
      --   guide the implementation
      -- . use of v and r

      -- next state
      nxt_r <= v;
    END PROCESS;

  . -- Pipelining

- TB: Test the functionality of the DUT:
  . start with easiest, default use case,
  . then verify the additional functions and features,
  . then verify the corner cases (e.g. 0, 1, some prime value, smallest, largest),
  . check that the TB did indeed run (i.e. no happened must not be regarded as passed),

- TB_TB_: Multi test bench that runs multiple TB in parallel to achieve coverage
  . multiple TB instances in parallel, each with other generic settings
  . typically it is easier to run multiple TB in parallel, then to run the tests
    that they do sequentially in one TB

- MMP_DUT: DUT with MM interfaces

- TB_MMP_DUT: Testbench with focus on MM interface access only, so typically no
  need for a TB_TB_MMP.