From 0021dd314f6f166c80944d2759cd0b4f33b2b276 Mon Sep 17 00:00:00 2001
From: Eric Kooistra <kooistra@astron.nl>
Date: Thu, 5 Nov 2020 14:26:08 +0100
Subject: [PATCH] Copied from
 https://svn.astron.nl/UniBoard_FP7/UniBoard/trunk/Firmware/doc/howto/

---
 doc/how_to_write_VHDL.txt | 1462 +++++++++++++++++++++++++++++++++++++
 1 file changed, 1462 insertions(+)
 create mode 100644 doc/how_to_write_VHDL.txt

diff --git a/doc/how_to_write_VHDL.txt b/doc/how_to_write_VHDL.txt
new file mode 100644
index 0000000000..00d190ffc5
--- /dev/null
+++ b/doc/how_to_write_VHDL.txt
@@ -0,0 +1,1462 @@
+How to write VHDL coding style
+==============================
+
+Contents:
+1) Introduction
+2) Coding style
+3) RTL development views
+4) State machines
+5) Clocked and combinatorial process
+  a) RTL (Register Transfer Level)
+  b) No latches
+  c) Sensitivity list
+  d) Limited use of variables
+  e) Avoiding delta-delay problems with derived clocks
+  f) Default only use positive edge triggered flip-flops
+  g) Use reset to clear flip-flops only
+  h) Clock domain crossing
+  i) Register outputs
+6) No-variables method
+7) Gaisler two-process method 
+8) Directory structure
+9) Naming conventions
+  a) Language key words
+  b) Files, entities, architectures
+  c) Library directory
+  d) Constants and generics
+  e) Variables
+  f) Signals
+  g) Types
+  h) Processes, instances, generates
+  i) Procedures, functions
+  j) Packages
+  k) Do not use other HDL reserved words
+10) Coding conventions
+  a) Default use descending bus order
+  b) Default use named association for the generic map and port map
+  c) Component instantiation
+  d) Only use IEEE.std_logic_1164 and IEEE.numeric_std libraries
+  e) Do not use the BLOCK statement
+  g) Do not use CONFIGURATIONs
+  h) Avoid embedded, tool-specific synthesis commands
+  i) Use enumerate values for FSM states
+11) File layout
+  a) Copyright statement
+  b) Purpose and description
+  c) Put entity, architecture and configuration in same file
+  d) Port order
+  e) Declaration order
+  f) Avoid mixing structural and RTL description code in the same architecture
+  g) Comment
+  h) Indent and alignment
+  i) TABs and spaces - Do not use TABs
+  j) Line length
+  k) Use separate line for each statement
+  l) Place each declaration on a separate line
+12) Standard packages
+13) Use of records
+14) Use of two-dimensional arrays
+15) Use of constants, generics and packages
+16) Functions
+17) Procedures
+18) Test benches
+  a) Verifying a DUT
+  b) Test bench interface packages and components
+  c) Multi-level test bench
+  d) Self checking and self stopping
+  e) Python test case using MM interface
+19) DP streaming component development example
+20) Simulation and synthesis debugging
+
+
+1) Introduction
+
+The basic idea of the VHDL coding style discussed here is that the structure
+and naming of the code closely reflects the task that the code performs. In
+summary the key aspects of this VHDL coding style are:
+
+ * a structured hierarchy of the design
+ * a structured way of naming all elements in the VHDL
+ 
+The VHDL coding style may be used as a cosmetic step during or at the end of a
+VHDL implementation. However the coding style can also be used as an integral
+part of the development. Not only after the coding has been done but rigth from
+the start and during all phases of the coding. In this way the coding style
+becomes more than just a matter of cosmetics, in fact it then is a valuable
+tool for developing robust and correct code.
+
+Something about VHDL and digital design:
+
+VHDL has its flaws (e.g. why is the INTEGER type resticted to <= 32 bit), but
+is a good language for describing and implementing digital design.
+Writing VHDL may look like programming software like C but it is fundamentally
+different. In particularly writing VHDL differs from sequential programming
+because digital designs are:
+
+- massively parallel, e.g. each register, RAM or LUT (lookup table) acts as a
+  parallel entity
+- restricted by low level details like e.g. multiple clock domains and hardware
+  specific resources
+
+Some more aspects of digital design are:
+
+- digital design often needs to cover the entire range from high level
+  application function to low level physical aspects of the targeted hardware.
+  Modular design and using hierachy help to separate these two levels. However
+  the targeted hardware often changes with every new project, this means that
+  digital design will inherently always involve also low level implementation
+  develoments.
+- The VHDL language offers many language constructs, but not all of these are 
+  possible or wise to use for describing hardware.
+
+Using a proper coding style helps to cope with these aspects of VHDL and digital
+design and to make good code that will work on the hardware (in time and with
+quality). Note that many of the coding style aspects discussed here in fact
+apply to any programming  language (C, MATLAB, Python, TCL, BASH, Erlang, ...).
+
+
+
+2) Coding style
+  
+The subsequent sections describe the coding style. Following a coding style is
+not a burden or something to do afterwards as a code clean up session. Instead
+following a coding style is an important development method at any stage of the
+code writing development. Proper code without loose ends is important because
+in combination with some testing this ensures that the code is correct. Dirty
+code may pass the testing as well, but does not give the same level of
+confidence that the code is correct. It is often impossible to cover all
+conditions in the tests. Therefore proper coding is an integral part of robust
+development. At the end of the development you have a polished code that
+performs what it needs to do and a test bench to confirm that it is correct.
+E.g. similar as a piece of music the code and the comment should have no false
+notes. The code should flow like a melody and all parts should be in harmony.
+ 
+At intermediate steps in the coding of the sources should be committed in SVN. 
+Such a step e.g. can be a cosmetic improvement, a correction or an added
+feature. The SVN commits can be seen as hooks that a mountaineer uses to secure
+the climb. The hooks (SVN commits) help to ensure that we develop in atomic
+steps. The hooks are useful for doing a 'diff' to highligth the changes or to
+see the progress by reading the commit comments. Typically we seldomly need to
+fall back to the hooks (i.e. return to a previous state in SVN). The hooks (SVN
+commits) help to stay on the rigth track and at any time during the development
+they show a tractable trace towards the end result. By developing in small
+steps, each with clear added value, we break down a complex problem into an
+easier problem.
+
+Obsolete or redundant code like declarations, assignments, comments, etc have
+to be removed already during development, because they obscure the true working
+of the code. It is like with the construction of a building. During the work
+the place needs to remain tidy and once it is finished the scaffoldings are not
+left behind.
+
+Some important aspects of the coding style are:
+
+- Use symmetry and similarity, e.g. similar functions should look similar in
+  code. This not only applies within a file, but also between files and in
+  fact throughout the entire development directory.
+- Use consistent names for related designs, entities, functions, etc. E.g.
+  there are components common_round, common_resize, common_requantize. Then
+  it would be 'wrong' to call a new related component common_clipper instead
+  of common_clip.
+- The code complexity must be equal or less than complexity of the problem.
+  Make use of the redundancy and don't care situations in the problem to
+  simplify the code.
+- The comment must have added value and be phrased correctly.
+- Avoid redundant code, e.g.:
+  . Move the same code in both the 'then' and the 'else' of an 'if' statement
+    above the 'if'
+  . Consider making a component of it or a function
+- Use mininmal interfaces, i.e. structure the design such that the blocks have
+  clear tasks to do without too much interdependencies.
+- Use interface names that describe the output of a task
+- Minimize the amount of control. Only gently touch the input so much as to get
+  to the required output.  Too much control conditions are hard to cover in
+  testing and harder to understand. A sign of too much control can be e.g.:
+  . Any control at all (always consider whether you can do without)
+  . A conditional statement with more than 2 levels
+  . Avoid unneccessary restrictions, make use of redundancy and don't care
+    situations
+  . A power of 2 counter wraps automatically, no need to check for max
+  . Do not force invalid data to a value, let it hold its current value
+- The code should always look clear. If the code looks not in harmony or 'ugly'
+  then this can be improved for sure. The cause can be e.g.:
+  . The function interfaces have not been placed at the rigth place. 
+  . The problem is not clearly understood yet.
+  The solution is e.g.:
+  . Determine what is the IO and what realy needs to be done and when.
+  . Break the problem down into atomic steps that each reveal a clear action
+    for solving the problem.
+  . Find the resemblence in other problems that have already been solved,
+    identify what is the difference or perhaps there is no real difference.
+  . Reuse existing components.
+  . Try to let the default behaviour be the wanted behaviour.
+- Perhaps the only excuse for an 'ugly' solution is when the input is presented
+  in an 'ugly' way and you can not change this (e.g. because it comes from an
+  existing device). However this then may be handled by first defining a 
+  component that 'beautifies' the input, so that the rest of the problem 
+  solution can again be implemented in harmoneous VHDL.
+- Use names that are accurate, i.e. not too specific and not to broad
+- From input to output the code names and functionality should gradualy change.
+  E.g. similar as Escher's metamorphosis transformation drawings, e.g. with
+  birds on one side that gradually change into fishes on the other side. Note
+  e.g. the symmetry and similarity of the internal streaming interface names
+  from rx input via control to tx output in eth.vhd:
+  
+  SIGNAL rx_adapt_sosi        : t_dp_sosi;
+  SIGNAL rx_crc_sosi          : t_dp_sosi;
+  SIGNAL rx_hdr_sosi          : t_dp_sosi;
+  SIGNAL rx_channel_sosi      : t_dp_sosi;
+  SIGNAL demux_sosi_arr       : t_dp_sosi_arr(0 TO c_mux_nof_ports-1);
+  SIGNAL eth_rx_sosi          : t_dp_sosi;
+  SIGNAL rx_frame_*
+  SIGNAL eth_tx_sosi          : t_dp_sosi;
+  SIGNAL mux_sosi_arr         : t_dp_sosi_arr(0 TO c_mux_nof_ports-1);
+  SIGNAL tx_mux_sosi          : t_dp_sosi;
+  SIGNAL tx_hdr_sosi          : t_dp_sosi;
+  
+  
+  
+3) RTL development views
+
+Basicly there are four views on RTL code:
+
+a) as text
+b) as schematic (drawing flipflops and combinatorial logic like 'and','or', mux,
+   demux, +, * etc at RTL level or of components at structural level)
+c) as a timing diagram (like the Modelsim wave window)
+d) state machine drawing using state circles and arrows between them
+
+The end result is a text file, but for the development it is often useful to
+also make a schematic drawing or to draw a timing diagram to better understand
+the functions and their dependencies (latencies) in time. A state machine
+drawing is useful to ensure that the state machine is correct. Many engineers
+only use the text view as development view. However the other views can be
+quite helpful to understand and improve the text code.
+
+
+
+4) State machines
+
+There are two types of state machine: Mealy and Moore. The difference is that
+for a Moore state machine the outputs only depend on the states, whereas for a
+Mealy state machine the outputs also depend on the inputs. We typically only
+use the Mealy state machines.
+The state machine drawing is useful for a state machine. In the drawing each
+state is represented by a circle and the arrows indicate the condition for a
+state change and the effect of this change on the outputs. For each circle the
+sum of the arrow conditions should be '1' (i.e. TRUE).
+In the RTL code all outputs get a default value and a case statement lists
+the conditions and effects on the outputs per state.
+Note that in fact all RTL logic can be regarded as Mealy state machines, but
+for e.g. a counter it is less useful to view it as a state machine.
+
+
+
+5) Clocked and combinatorial process
+
+a) RTL (Register Transfer Level)
+The RTL logic functionality can be described in clocked processes (registers,
+flip flops) and combinatorial (transfer) processes. The convention is to
+clearly separate the clocked process from the combinatorial processes, whereby
+the clocked process only lists the q <= nxt_q register assignments. The nxt_q
+assignments is done in a separate combinatorial process.
+
+  p_clk : PROCESS(clk)
+  BEGIN
+    IF rising_edge(clk)
+      q <= nxt_q;   -- q becomes its d input called nxt_q
+    END IF;
+  END PROCESS;
+
+  nxt_q <= f(...);  -- concurrent statement
+  
+or:
+  
+  p_comb : PROCESS(sensitivity list)   -- process block statement
+  BEGIN
+    nxt_q <= f(sensitivity list)
+  END PROCESS;
+  
+Note that this scheme describes any logic function that we need to implement.
+Logic is always a clocked part and a combinatorial part (i.e. a Mealy machine).
+The path delay through the combinatorial part is what determines the maximum
+clock frequency. By adding more pipelining register stages the combinatorial
+paths can be shortened to increase the maximum clock frequency.
+
+Important clocking rules:
+- Do not use differente clock edges. Within a clock domain one should only use
+  the rising_edge(clk).
+- Do not use data signals as clock. Two exceptions are at a single central 
+  point or at the output pin of the FPGA. The data that is used for the clock
+  must come directly from a flipflop, because for a combinatorial output
+  different internal processing delays can cause glitches during the set up.
+- Do not gate clocks, because within an FPGA this creates a new clock and the
+  number of clock trees is limited. In an ASIC clock gating can be used to
+  save power, but it then needs to be done centrally and in a controlled way 
+  to ensure that no glitches will occur due to the gating logic.
+- Use resets that are synchronized to their corresponding clock domain.
+
+b) No latches
+The logic must have no latches, we only use flipflops and registers. Therefore
+every conditional statement (IF, CASE) should address all conditions. In VHDL
+it is often possible to first assign a default value to the nxt_* signals and
+then in the rest of the process assign the conditional value(s). E.g.:
+
+  nxt_q <= q;
+  IF cnt=b THEN
+    nxt_q <= 0;
+  END IF;
+
+The Quartus synthesis report warns for inferred latches. These unintentional
+latches must be corrected in the RTL code.
+  
+c) Sensitivity list
+Make sure that all signals that are read in a combinatorial process are also in
+the sensitivity list, because that is how the synthesis and hardware will
+interpret it. The signals that are read in a process are:
+- Signals that are part of a condition e.g. '>, <, =' in e.g. an IF statement.
+  The signal can be on either side of the condition.
+- Signals that are assigned to another signal or variable, i.e. at the
+  right of <= or :=.
+If a signal is used but not in the sensitivity list of the combinatorial
+process then the results can be different in simulation, because then the
+process will not be evaluated if that input changes.
+The Quartus synthesis report warns for incomplete sensitivity lists. These
+warnings must be corrected in the RTL code.
+
+d) Limited use of variables
+Variables are preferrably not used, because they are not visible in the
+ModelSim Wave window, which makes it difficult to debug them with respect to
+signals.
+
+To describe digital logic variables are not needed, using only signals is
+sufficient. In fact using variables to describe digital logic is somewhat a mis
+conception, because variables are assigned sequentially in a process, whereas
+implemented logic is inherently parallel. For example a filter, a counter, but
+also every LUT or flipflop: in hardware they all run in parallel. Signals get
+their value at the end of a process, this reflects the parallel behaviour of
+logic. Hence all logic can be described using only signals. The sequential
+behaviour of variable resembles ordinary programming code that runs on a
+processor, hence it seems that variables are added to VHDL to pamper software
+designers and make VHDL look like e.g. C. In VHDL test benches variable can be
+used, e.g. to access a file.
+
+e) Avoiding delta-delay problems with derived clocks
+Assigning a clock to another signal does cause a delta delay. If data is passed
+on using these two (identical) clocks, then the simulation can mismatch the 
+reality (synthesis). Because the data is then be captured one clock cycle later
+in simulation due to the delta-delay between the rising_edge() of the two
+clocks. To avoid these delta-delay problems all data processing logic
+should only use the derived clock. For reset signals it is less critical,
+because typically a design does and should not rely on reset release being one
+cycle later or not.
+Passing a clock through hierarchy does not cause delta-delays. However, for a
+clock that is created inside a component and then used both internally and
+externally the clock needs to be assigned to the output port via a auxiliary
+signal like: clk_out <= i_clk_out, because an output port signal cannot be read
+in VHDL. To avoid the delta-delay problem this implies that the derived clock
+needs to be output as clk_out and then back input as clk_in. Both internally and
+externally the clk_in should then be used to clock the data.
+
+f) Default only use positive edge triggered flip-flops
+
+g) Use reset to clear flip-flops only
+Asynchronous and synchronous resets should only place flip-flops in a known
+start-up state. Resets should not be used for other purposes. It should be
+clear where a reset is generated and to which clock domain it belongs.
+
+h) Clock domain crossing
+Use the dedicated components from common library to cross clock domains:
+
+- common_areset.vhd             -- for a reset
+- common_async.vhd              -- for a level signal
+- common_spulse.vhd             -- for a pulse signal
+- common_reg_cross_domain.vhd   -- for MM data vector
+- common_fifo_dc.vhd            -- for streaming data vector
+
+i) Register outputs
+Default register the outputs of a component. Typically there is then no need
+to register the inputs. The output registering between components eases timing
+closure within the components. Exceptions are eg. the ready output for
+streaming flow control which typically needs to be combinatorial output to
+maintain the ready latency.
+
+
+
+6) No-variables method
+
+The RTL scheme desribed at point 5) is referred to as the 'no-variables method'.
+The no-variables method was used in LOFAR and is also used in the UniBoard
+firmware. The no-variable method separates the clocked process that defines the
+registers from the combinatorial process that defines the function. There is
+only one clocked process per clock domain but there can be multiple
+combinatorial proceses. The processes do not use of variables. 
+
+
+7) Gaisler two-process method
+
+The Gaisler two-process method can be regarded as a clever method that helps
+to bridge the gap between a software approach (more sequential thinking) and
+a hardware approach (more parallel thinking) towards developing logic. In the
+Gaisler two-process method there is:
+
+- one clocked process and
+- one combinatorial process.
+
+All local registers are grouped in a local record type such that the clocked
+process becomes quite simple and uniform and whereby the functional operation
+is in the combinatorial process:
+
+  TYPE t_reg IS RECORD
+    -- local registers (flip flops)
+  END RECORD;
+  
+  CONSTANT c_reg_rst  : t_reg := (<reset values for the t_reg fields>);
+
+  SIGNAL r, nxt_r : t_reg;
+  
+  -- calling the local registers 'r' also fits the Gaisler style
+  -- or instead to fit the Gaisler style call 't_reg' --> 'reg_type'
+  -- or instead to fit the Gaisler style call 'nxt_r' --> 'rin'
+  
+  -- Map t_reg outputs to entity outputs
+  <entity outputs> <= r outputs;
+  
+  -- p_reg
+  r <= nxt_r WHEN rising_edge(clk);
+
+  p_comb : PROCESS(rst, r, <other inputs>)
+    VARIABLE v : t_reg;
+  BEGIN
+    -- Default
+    v := r;
+    
+    -- Functionality
+    <Here the logical operations on r and the other inputs to determine
+     nxt_r and the combinatorial outputs are defined.>
+    
+    -- Reset and nxt_register
+    IF rst='1' THEN
+      v := c_reg_rst;
+    END IF;
+
+    nxt_r <= v;  
+  END PROCESS;
+  
+Note that p_reg is similar as with the RTL clocked process style defined in 5a,
+but the big advantage of the Gaisler style is that all local registers get
+nicely grouped into one record.
+  
+The single combinatorial process uses next register value variable v : t_reg
+and some more auxiliary variables if necessary. The rst is applied at the end
+so that it acts like a synchronous reset and then v is assigned to nxt_r. The 
+variable v gets initialized with the current register value r and the rest of
+the process code decribes and defines the logical operation on r and the
+other process inputs to get the nxt_r. The nxt_r is only assigned to, so
+therefore it is not in the sensitivity list.
+
+The description in dp_packet_merge.vhd explains how to use v and r in p_comb.
+
+The default/preferred coding style rule is that v is only assigned, so not
+read. This means that v does not occur in an if condition and also not right
+of :=. If needs to be read then first use another variable to determine this
+intermediate combinatorial result based on the inputs and r, and then assign
+this variable to v. In this way the use of v is focused on defining the next
+input for r. Using nxt_r or v seems similar in this way, because in both
+schemes they are only assigned to. A difference is that the scheme with nxt_r
+uses an asynchronous rst, whereas the Gaisler scheme with v uses a synchronous
+rst.
+
+The default/preferred coding style is to treat each v field implementation for
+r in a separate section in the p_comb process. The alternative would be to 
+have a separate section per input and per r field, but that seems less clean
+in general.
+
+The reset is applied in p_comb so it is used as a synchronous reset, for most
+functionality this is appropriate. Still the reset could instead be applied
+within p_reg to have an asynchronous reset. The advantage of an asynchronous
+reset is that it gets applied even without a clock. The advantage of a
+synchronous reset is that its timing with respect to the clock gets taken care
+of automatically like any other register data input.
+
+In 5d it is argued to minimize the use of variables. However within p_comb for
+the Gaisler method using variables is appropriate. In fact the variables are
+more used as auxiliary or temporary variables, therefore they can have
+insignificant names while the signals represent the function of the process and
+have the significant functional names. Using variables as temporary variables
+fits a sequential way of thinking and fits the VHDL definition of variables in
+a VHDL process, and it does not curse with the parallel nature of digital
+logic. Therefore the general rule for using variables seems to describe the
+true functionality in signals and use variables only as auxiliary variables to
+hold temporary results.
+
+A variable in a process can also be used to hold a dynamic semi-constant value
+that gets determined at process entry, but that does not get modified further
+on, e.g. as with v_siso_arr_* in dp_bsn_align.
+
+A mixed style is also possible whereby the t_reg record is used to have the 
+clarity and ease of the single assignment clocked process, but whereby the
+logical functionality is still defined in one ore more combinatorial processes
+and or concurrent statements to reflect the parallel behaviour of digital 
+logic.
+
+
+
+8) Directory structure
+
+It is proper to clearly separate the VHDL that describes logic that can run on
+hardware from the VHDL that describes the test bench.
+
+<module name>/build/sim/modelsim/
+             /build/synth/quartus/
+             /data/
+             /src/vhdl
+             /tb/vhdl
+
+
+
+9) Naming conventions
+
+All names and text must be in English.
+
+All names must reflect the use at the correct level. E.g. a general purpose 
+counter can be called common_cnt, while its instance is called u_rx_cnt. It
+would be wrong to call the counter entity rx_cnt if it is in fact a general
+purpose counter.
+
+Names like tmp, help, cnt2 are bad. In general if you can not give an object
+(e.g. a signal, an entity) a proper name then that is a sign that you do not
+(yet) have a clear view on what your design should do and how it should work.
+Hence spending time on defining accurate names is an integral part of proper
+design.
+
+The structure of a design consists of components and processes. A general 
+naming convention is to give interfaces within the structure a name and use 
+this interface name as pre, middle or post fix in the corresponding signal
+names. Note that in this way the naming directly relates to creating a proper 
+design structure.
+
+
+a) Language key words
+The convention for manually written code is to use capitals for all VHDL key
+words and small characters for entities, signals etc. Underscores are used to
+seperate parts of a name, so eg. my_signal_name, proc_name.
+
+
+b) Files, entities, architectures
+Hierarchy within a module is represented via the VHDL file naming. Typically
+the entity and architecture are kept in the same file.
+
+ - The file name is always equal to the entity name, with the suffix .vhd.
+ - If a specific architecture is used, architecture filename ends with _a.vhd.
+ - General package filenames end with _pkg.vhd.
+ - Component declaration package filenames end with _component_pkg.vhd.
+
+For the VHDL architecture names we use the following
+category names within /UniBoard/Firmware:
+
+  - pkg      = VHDL package
+  - str      = Structure architecture containing only components
+  - rtl      = RTL architecture containing Register Transfer Level code (i.e.
+               processes)
+  - wrap     = Wrapper structure
+  - stratix4 = Wrapper structure containing Stratix4 specific components
+  - beh      = Architecture containing behavioral code (e.g. for a test bench
+                 model of an I2C sensor, a flash, an ADC, etc)
+  - empty    = Empty architecture
+  - tb       = Test bench architecture
+  
+The two main architecture categories are 'rtl'and 'str'. Typically the top level
+components consist of 'str' architectures and the lowest level components (the
+'leaves') contain the 'rtl' architectures that actually define the function.
+For external IP like from the MegaWizard a 'wrap' architecture hides these IP
+architectures. In practise it can occur that RTL code needs to be combined with
+instantitated components, so then the architecture name is a bit arbitrary.
+
+Putting the entity and architecture into separate files only seems useful when
+the architecture is FPGA vendor specific and would cause problems if e.g. a
+'stratix4' and a 'virtex6' architecture are both visible to the synthesis
+tool. Keeping the package in a separate file e.g. st_pkg.vhd remains useful,
+because it avoids unnecessary recompilations. The category name can then be
+used in the VHDL architecture file name name by combining the entity name
+with the architecture category as post fix, so:
+
+  <entity       file name> = <entity name>.vhd
+  <package      file name> = <package name>_pkg.vhd
+                           = <package name>_component_pkg.vhd
+  <architecture file name> = <entity name>_a_category.vhd
+  
+Instead of this file name post fixes also braces () as with (pkg) and with
+(architecture category) have been used in the file name, but it appears that
+'make' under Linux can not cope well with using braces in the file names.
+Therefore do not use braces () in the file names.
+
+
+c) Library directory
+Within a library of that has several files all files should start with the
+same prefix. That prefix corresponds to the library directory name and is 
+typically also used for all items in a library_pkg.vhd file it that is used.
+
+For example some module called ST could look like:
+
+  /modules/st/sim/modelsim/st.mpf      -- modelsim project file
+           st/src/vhdl/st.vhd          -- top ST entity with str architecture
+                       st_pkg.vhd      -- file name postfix '_pkg'
+                       st_ctrl.vhd
+                       st_ctrl_tx.vhd
+                       st_calc.vhd
+                       st_calc_a_str.vhd  -- file name post fix '_a_' + 
+                                          -- architecture name
+           st/tb/vhdl/tb_st.vhd        -- test bench for st
+                      tb_tb_st.vhd     -- test bench of multiple tb_st
+                      tb_st_calc.vhd   -- test bench for st_calc
+        
+In this example the '_tx' functionality in st_ctrl_tx is specific to st_ctrl,
+therefore it is not put in a more general seperate module. A general type
+of IO like an I2C interface would be kept as a seperate module so that it
+can be used easily within other modules.
+
+This ST module may be used in different modules or designs, typically using
+generics to adapt it to the design specific requirements (e.g. word width)
+
+
+d) Constants and generics
+All constant values must be identified by a name in order to easily search for
+them. It is not allowed to used numbers directly in statements. Entity port
+widths must be defined via generics, even if a range is fixed. This to clearly
+distinghuis the meaning of the range value. For some port widths the generic
+may even allow choosing another value dependent on the usage. Furthermore
+constants and generics have to be clearly distinghuised from signals. Therefore
+use prefixes:
+
+. c_      constant
+. k_      constant 
+. g_      generic
+
+For example:
+
+  CONSTANT c_word_sz    : NATURAL := 4;
+  CONSTANT c_byte_w     : NATURAL := 8;
+  CONSTANT c_word_w     : NATURAL := c_byte_w*c_word_sz;
+
+The k_ prefix is rarely used, but it can e.g. be used as a reference for the
+actual constant:
+
+  CONSTANT k_sel : STD_LOGIC_VECTOR(c_max_w-1 DOWNTO 0) := "000";                             -- c_max_w = 3
+  CONSTANT c_sel : STD_LOGIC_VECTOR(c_actual_w-1 DOWNTO 0) := k_sel(c_actual_w-1 DOWNTO 0);   -- c_actual_w = 2 <= c_max
+
+or e.g. as a local constant in a package that should not be used outside that
+package.
+
+e) Variables
+Similar variables have to be distinghuised from signals. Therefore use pre fix:
+
+. v_      variable
+. v       next value for local register record in Gaisler two-process method
+
+
+f) Signals
+Signals do not have to have a prefix or a post fix to identify them as signals,
+instead for signals a pre or post fix is better used to clarify its functional
+meaning:
+
+. _ack     acknowledge
+. _adr     address
+. _addr    address
+. _address address
+. _arr     array
+. _avail   available
+. _bi      bit index, or byte index
+. buf      buffer, memory
+. _clk     clock (used in a rising_edge process)
+. clr      clear
+. cnt      counter
+. ctrl     control
+. _cpx     complex
+. _cplx    complex
+. _complex complex
+. _cur     current
+. _d       delayed signal
+. _dat     data
+. _data    data
+. _dec     decode
+. _decode  decode
+. _depth   depth, size
+. _dis     disable
+. _dly     delayed signal
+. dp_      data path streaming interface signal
+. _done    done
+. _ely     early signal, equivalent to 'nxt_' and to the opposite of '_dly'
+. _early   early signal
+. _en      enable
+. _enc     encode
+. _encode  encode
+. _eof     end of frame
+. _eop     end of packet
+. _err     error
+. _evt     event
+. _fevt    falling event
+. ff       flipflop
+. fifo     first in first out
+. hdr      header
+. i_       internal copy of an output signal
+. in       input
+. io       input/output
+. _i       internal auxiliary signal, where signal 'x_i' closely relates to 'x'
+. _im      imaginary
+. _imag    imaginary
+. _late    late signal
+. _lat     latency
+. _latency latency
+. _len     length in e.g. bytes or data words
+. mem      memory
+. _max     maximum
+. _min     minimum
+. _miso    master in slave out
+. _mosi    master out slave in
+. mm_      memory mapped interface signal
+. _n       active low
+. _N       negative pin of a differential signal
+. nat      natural
+. nof_     number of
+. next_    next functional value
+. nxt_     next value of a register, e.g. q <= nxt_q
+. ofs      offset
+. offset   offset
+. out      output
+. _org     original
+. _p       pipeline delay (or early)
+. _pp      pipeline delay two clock cycles
+. _ppp     pipeline delay three clock cycles
+. _P       positive pin of a differential signal
+. _phs     phase
+. pulse    pulse
+. prev_    previous signal value, equivalent to '_dly'
+. r        local register record in Gaisler two-process method
+. ram      memory
+. rd       read
+. _re      real
+. _real    real
+. read     read
+. _rdy     ready
+. _reg     registered signal
+. _revt    rising event
+. req      request
+. rst      reset
+. rsl      resolution
+. rcv      receive
+. rx       receive
+. s_       state machine state enumerate name
+. sel      select
+. _siso    source in sink out
+. _sosi    source out sink in
+. sl       standard logic
+. slv      standard logic vector
+. _sof     start of frame
+. _sop     start of packet
+. st_      state machine state name
+. st_      streaming interface signal
+. _size    size, depth
+. _sz      size in e.g. bytes
+. _sync    synchronisation
+. this_    this_ signal as related to some next_ signal
+. tr       transmit/receive, transceiver
+. tx       transmit
+. _val     valid
+. _vec     vector
+. _wi      word index
+. wr       write
+. write    write
+. _w       width in bits
+. x        times, e.g. clk_2x
+. xmt      transmit
+. zdly     z^(-1) DSP sample period delay
+
+For example:
+
+  CONSTANT c_nof_mp          : NATURAL := 1;
+  CONSTANT c_nof_rcu_per_mp  : NATURAL := 4;
+  CONSTANT c_nof_rcu         : NATURAL := c_nof_mp * c_nof_rcu_per_mp;
+
+  SIGNAL tbb_frame_hdr       : t_frame_hdr_arr;
+  
+Usage examples:
+
+- A bus of _dat[], _val, _sync together indicates the data, whether it valid and
+  the relation to some external time.
+- A bus of _hdr, _dat, _val, _sof, _eof can indicate a packet with header and 
+  data in a stream of words.
+- At the rising clock edge the q output of a flipflop or register becomes the d
+  input, therefore the d input is called nxt_q. The 'nxt_' is used as prefix for
+  all register d inputs and the clocked processing only contains q <= nxt_q
+  like assignments.
+
+  
+g) Types
+. t_       type
+. t_c_     a type for constant values, typically a record
+. t_e_     an enumerate type
+. t_*_enum an enumerate type
+. t_reg    local registers record for Gaisler two-process method (reg_type)
+. _arr     array with indexing (I)
+. _2arr    two-dimensional array defined as array of arrays with indexing (I)(J), index range J is fixed by the type
+. _3arr    three-dimensional array defined as array of arrays of arrays with indexing (I)(J)(K), index ranges J and K are fixed by the type
+. _mat     matrix, two-dimensional array with indexing (I,J)
+. _matrix  matrix, two-dimensional array with indexing (I,J)
+. _cub     cube, three-dimensional array with indexing (I,J,K)
+. _cube    cube, three-dimensional array with indexing (I,J,K)
+. _rec     record
+
+For example:
+
+  TYPE t_frame_hdr_arr IS ARRAY (0 TO c_nof_rcu-1) OF t_frame_hdr_rec;
+  TYPE t_sl_matrix IS ARRAY(INTEGER RANGE<>, INTEGER RANGE <>) OF STD_LOGIC;
+  TYPE t_data_mat IS ARRAY (0 TO c_nof_tlen-1, 0 TO c_nof_input-1) OF STD_LOGIC_VECTOR(g_data_w-1 DOWNTO 0);
+ 
+
+h) Processes, instances, generates
+. p_      process name
+. u_      component instance name
+. gen_    generate name
+. no_     'no generate' name
+
+The clocked process that creates the registers can typically be called p_clk
+or p_reg. The combinatoral processes get the name of their main output signal
+or a name that reflects the task of that process.
+
+An component instance typically gets the component name preceded by 'u_' as
+instance name, or an instance name that reflects the task of the component or
+the main output of the component.
+
+Generate example:
+
+  CONSTANT c_debug_no_cep : NATURAL := sel_sim_synth(g_sim, 0, 0);
+  no_cep : IF c_debug_no_cep /= 0 GENERATE
+    -- default signal assigments
+  END GENERATE;
+  gen_cep : IF c_debug_no_cep = 0 GENERATE
+    -- instantiate module cep
+  END GENERATE;
+
+  
+i) Procedures, functions
+
+Procedures and functions may have a pre or post fix. They may also use capitals
+to separate parts of their name. Less common functions and procedure should
+have these prefixes in addition to their package prefix:
+
+. func_<package_prefix>_...   function prefix
+. proc_<package_prefix>_...   procedure prefix
+
+or
+
+. <package_prefix>_func_...   function prefix
+. <package_prefix>_proc_...   procedure prefix
+
+
+j) Packages
+
+All declarations in a package must have a package_prefix. This package_prefix
+must be relatively short and sufficient to give a clue one where the item
+is defined. Package prefix examples:
+
+- common_pkg       -> no prefix, this is an exception, because this package is
+                      so common
+- common_mem_pkg   -> '_mem_'
+- dp_stream_pkg    -> '_dp_'
+- diag_pkg         -> '_diag_'
+- tr_nonbonded_pkg -> '_trnb_'
+- ddr3_pkg         -> '_ddr3_'
+
+
+k) Do not use other HDL reserved words
+Avoid using reserved words from Verilog in VHDL and vice versa. Also avoid
+using then eg. as record fieldname.
+
+
+
+10) Coding conventions
+
+a) Default use descending bus order
+
+Descending (DOWNTO) bus order must be used whenever possible in the declaration
+of arrays. Ascending (TO) bus order can be used for signals for which the
+ascending number may have a specific significance (stages of a processing,
+filter tap indexes, etc.).
+
+b) Default use named association for the generic map and port map
+Avoid instantiating components using positional association for generic or
+ports. Using named associations avoids hard to detect errors and improves
+the readability. An exception can be a component instance in a wrapper, like
+the technology independent IP wrappers in RadioHDL.
+
+c) Component instantiation
+Default use entity instantiation to instantiate a component. This avoids the
+need for component declarations (either locally or in a component package).
+
+For technology independent code component instantiation is used to support
+multiple IP components from different vendors or FPGA families without having
+to compile this IP, because only the plain VHDL component declaration needs 
+to be known.
+
+d) Only use IEEE.std_logic_1164 and IEEE.numeric_std libraries
+Do not use deprecated numeric libraries:
+- Do not use bit or bit-vector. Use only STD_LOGIC and STD_LOGIC_VECTOR for
+  bits and bit vectors.
+- Do not use older (and unofficial) std_logic_unsigned and std_logic_signed.
+- Do not used std_logic_arith.
+
+Note that the same type (e.g. SIGNED) from different packages are actually
+different and not interoperable. Mixing different types cause compatibility
+issues.
+
+e) Do not use the BLOCK statement
+Because they are akward and not necessary.
+
+g) Do not use CONFIGURATIONs
+Because they are akward and not necessary.
+
+h) Avoid embedded, tool-specific synthesis commands
+Synthesis directives are best placed into constrain files. Generalized
+tool-independent directives for synthesis commands are emerging. Use them
+sparely and check they are supported in at least the major synthesis tools
+and FPGA families.
+
+i) Use enumerate values for FSM states
+Use enumerate values for Finite State Machine (FSM) states and defined them
+with a 's_' prefix. Do not hard-code FSM State Vector values (VHDL), so do
+not code them explicitly.
+
+
+
+11) File layout
+
+a) Copyright statement
+
+The file should start with a copyright statement that includes the date of 
+creation and the author(s) and the affiliation. For example for Astron:
+
+-------------------------------------------------------------------------------
+--
+-- Copyright (C) 2011
+-- ASTRON (Netherlands Institute for Radio Astronomy) <http://www.astron.nl/>
+-- P.O.Box 2, 7990 AA Dwingeloo, The Netherlands
+--
+-- This program is free software: you can redistribute it and/or modify
+-- it under the terms of the GNU General Public License as published by
+-- the Free Software Foundation, either version 3 of the License, or
+-- (at your option) any later version.
+--
+-- This program is distributed in the hope that it will be useful,
+-- but WITHOUT ANY WARRANTY; without even the implied warranty of
+-- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+-- GNU General Public License for more details.
+--
+-- You should have received a copy of the GNU General Public License
+-- along with this program.  If not, see <http://www.gnu.org/licenses/>.
+--
+-------------------------------------------------------------------------------
+
+b) Purpose and description
+Next the file should contain a 'purpose' that summarizes the purpose of the
+component in one or at most two sentences. The 'description' then provides a
+more detailed description of the function of the component. Optionally there
+can also be a 'remarks' section that lists some particularities of the
+component. For test benchches there also should be a 'usage' section that
+shows how to run the testbench in simulation and that briefly describes the
+expected result.
+
+-- Purpose: 
+-- Description:
+-- Remarks:
+-- . 
+-- . 
+-- Usage:        -- for test benches
+-- > as 10
+-- > run -all
+-- The tb is self stopping and self checking
+
+
+c) Put entity, architecture and configuration in same file
+Use a single file for completely describe a module. Possible exception if
+completely different architectures must be used e.g. for different logic
+families. Limit this situation to a few low level IP modules.
+
+Default there is only one entity and architecture in the file. Exception can
+be:
+- Auxiliary entities that are instantiated only in this file. The top level
+  entity corresponds to the file name and needs to appear last in the file
+  due to the compile order. Eg. dp_repack_data.vhd which uses local entities 
+  dp_repack_in and dp_repack_out.
+- Multiple architectures that to show different implementation schemes. The
+  last architecture is the default. Eg. uth_rx.vhd and uth_tx.vhd.
+
+d) Port order
+Ports are listed in logic groups, with clock and reset first. Typically first
+list the streaming inputs and then the streaming outputs. The MM control ports
+are also grouped. In case of multiple clock domains the clock may be listed
+close to its group or central at the top. 
+
+e) Declaration order
+In the declaration section of an architecture, if possible place declarations
+in the following order:
+
+� Local types
+� Local constants
+� Local functions, procedures
+� Signals
+
+The detailed declaration order should as much as possible follow the functional
+flow from component input to output and from begin to end of the architecture.
+
+
+f) Avoid mixing structural and RTL description code in the same architecture
+In general a architecture contains only RTL code (ie. with process statements)
+or it only instantiates other components to create structure. An exception 
+is eg. the use of some low level components from common library in an 'rtl'
+architecture. Keeping only structural instantiations in a 'str' architecture
+makes the functionality more clean and therefor more clear.
+
+
+g) Comment
+Typically the HDL code needs to be self explanatory, i.e. properly set up, so
+there is no need for comment. The purpose of the module can be described in the
+HDL source (e.g. above the entity) or in a seperate design document.
+
+In some cases inline comment is needed though to help the reader. Try to write
+the comment such that it is still valid even if thecode is changed. Care must
+betaken that the comment is up to date and accurate. This is often what is not
+the case, because most designers will forget to update the comment as well when
+they modify the code. This is a good reason to minimize the amount of comment,
+because inaccurate comment is worse than no comment.
+
+In any case the comment should not explain the VHDL syntax (the assumption is
+that the reader knows VHDL). Comment should also not be used to disable
+obsolete code, because we have SVN to keep the obsolete data. In general
+comment should also not be used as a compile option, because for that generate
+statements should be used.
+
+
+h) Indent and alignment
+
+Default use a fixed indent of 2 spaces. VHDL has BEGIN and END statements that
+group a section so 2 spaces indent is sufficient, however eg. 4 spaces is also
+allowed. Within one file the indent should be fixed. Take care of proper
+alignment, e.g. all declared signals should be aligned at the colon ':'.
+
+
+i) TABs and spaces - Do not use TABs
+Using TABs is not allowed in the source code, because they have different sizes
+in different editors which can disturb the layout. Instead set the editor to
+fill in 2 spaces when the TAB key is used. Note setting the TAB size to 2 spaces
+is not enough, set TAB such that it truely prints 2 spaces.
+
+
+j) Line length
+For the purpose and description try to keep the comment line lenght <= 80 to
+allow printing it in courrier on an A4 without line wraps. For the rest of the
+file the line length can be larger up to about ~200, such that they still fit
+on a screen.
+
+k) Use separate line for each statement
+
+l) Place each declaration on a separate line
+
+
+
+
+12) Standard packages
+
+For signed and unsigned only use (so do not use other packages with similar
+functions):
+
+  LIBRARY IEEE;
+  USE IEEE.STD_LOGIC_1164.ALL;
+  USE IEEE.NUMERIC_STD.ALL;
+
+
+
+13) Use of records
+
+Using records makes signal interfaces more clear. Typically the input signals
+of an interface are grouped into one record and the output signals of that
+interface are grouped into another related record. Defining too many different
+types of records also makes the VHDL less readable, because it is then not 
+easy to remember what each record type means. For the main Memory Mapped (MM)
+and Streaming (ST, also called DP for data path) interfaces it appears possible
+to define just a few standard record types that can be used in all MM and ST
+components. These records are defined in:
+
+- MM : common_mem(pkg).vhd
+- ST : dp_stream(pkg).vhd
+
+
+
+14) Use of two-dimensional arrays
+
+Multi-dimensional arrays can be declared and indexed in several ways:
+
+a) As matrix of elements -> index(I,J):
+  TYPE t_sl_matrix IS ARRAY(INTEGER RANGE<>, INTEGER RANGE <>) OF STD_LOGIC;
+  TYPE t_data_matrix IS ARRAY (0 TO c_nof_tlen-1, 0 TO c_nof_input-1) OF STD_LOGIC_VECTOR(g_data_w-1 DOWNTO 0);
+    
+  SIGNAL a_mat  : t_sl_matrix(0 TO c_nof_tlen-1, 0 TO c_nof_input-1);
+  SIGNAL d_mat  : t_data_matrix;
+    
+b) As array of arrays -> index(I)(J):
+  -- The element range needs to be set in the TYPE.
+  TYPE t_sosi_2arr IS ARRAY (INTEGER RANGE <>) OF t_dp_sosi_arr(0 TO c_nof_input-1);
+  SIGNAL a_sosi_2arr : t_sosi_2arr(0 TO c_nof_tlen-1);
+  
+c) As a 1-dimensional array --> index( I * length(J) + J)
+
+d) As a fixed number of declared 1 dim arrays --> index I in name and J in arr
+     SIGNAL input_arr0 : t_dp_sosi_arr(0 TO c_nof_input-1);
+     SIGNAL input_arr1 : t_dp_sosi_arr(0 TO c_nof_input-1);
+     SIGNAL input_arr2 : t_dp_sosi_arr(0 TO c_nof_input-1);
+     SIGNAL input_arr3 : t_dp_sosi_arr(0 TO c_nof_input-1);
+
+Of course option d) is less nice, because the number of I arrays is fixed in
+declared row names. However for some cases it may be appropriate. Options a)
+and b) are the true 2-dim arrays. Option a) has the disadvantage that the
+elements can only be assigned per element, it is not possible to assign all J
+with an 1-dim array of equal length. This is possible with option b), but
+option b) has the disadavantage that the length of the element array must be
+known at the TYPE definition. Typically option b) then requires to define this
+2-dim array in a module package, because only then the TYPE can be used in
+multiple files. E.g.:
+
+  c_<module_name>_nof_j_max = 32;
+  TYPE t_<module_name>_i_j_2arr IS ARRAY (INTEGER RANGE <>) OF t_dp_sosi_arr(c_<module_name>_nof_j_max-1 DOWNTO 0);
+
+  out_sosi_2arr : t_<module_name>_i_j_2arr(g_nof_i-1 DOWNTO 0, g_nof_j-1 DOWNTO 0);
+  
+An alternative can be to not use a 2-dim array like a) or b) but instead use an
+aggregate 1-dim array like c). E.g.:
+   
+  out_sosi_arr  : OUT t_dp_sosi_arr(g_nof_i*g_nof_j-1 DOWNTO 0);
+
+The advantage of such an aggregate 1-dim array is that it can be entirely set
+by means of generics. The disadvantage is that the 2-dim indexing is a bit
+more complicated and that the 2-dim properties of a signal do not show as such
+in a Wave Window. A way around is to internally still declare a local signal
+of a true 2-dim array type like a) and use that to monitor the 1-dim port
+signal. E.g. for port out_sosi_arr above this then yields out_sosi_matrix below:
+
+  SIGNAL out_sosi_matrix : t_integer_matrix(g_nof_i-1 DOWNTO 0, g_nof_j-1 DOWNTO 0);
+  
+  p_wires : PROCESS(out_sosi_arr)
+  BEGIN
+    FOR I IN g_nof_i-1 DOWNTO 0 LOOP
+      FOR J IN g_nof_j-1 DOWNTO 0 LOOP
+        out_sosi_matrix(I,J) <= out_sosi_arr(I*g_nof_j+J);
+      END LOOP;
+    END LOOP;
+  END PROCESS;
+
+
+  
+15) Use of constants, generics and packages
+
+How to define a constant and where depends on the function of the constant.
+Constants can be define at:
+
+1) Locally in the architecture
+2) As default value of a generic that typically does not need to be changed
+3) As a global constant in a package that can be used directly in any
+   architecture that uses this package
+4) As a global constant in a package that is used as default value for a
+   generic
+   
+The advantage of using generics is that the parameter can be changed by the 
+block that instantiates it. If the generic is propagated to up the hierarchy
+then it can be controlled in a test bench, which is useful to verify multiple
+generic settings in simulation without having to edit the code.
+
+The disadvantage of using generics to pass on constant values is that this 
+can bother the user with detailed knowledge of the component.
+
+Constants in a package are useful to make them known in lower hierarchy blocks
+wthout having to pass them on via the entity generics. The limitation of
+constants in packages is that they can only be changed by editing the file.
+
+
+
+16) Functions
+
+Functions have input arguments and return a single value. Functions cannot
+wait for a clock, so they work combinatorially (= immediately). Functions
+are useful to:
+  - derive constant or generic values that depend on input conditions
+  - map a signal to an other signal, eg. to change the order in an array
+ 
+For many examples of reusable functions and some procedures see:
+  - common_pkg.vhd
+  - dp_stream_pkg.vhd
+  
+
+
+17) Procedures
+
+Procedures have input, output or inout arguments and no return value:
+
+  - A procedure with only inputs seems not useful.
+  - A procedure can drive one or more outputs. Procedures can wait for a
+    clock, so they can output (generate) a sequence in time. This is useful
+    in a test bench.
+  - Using an inout argument is useful if the procedure needs to maintain some
+    storage, eg. the previous value of the current output. The storage cannot
+    be defined inside the procedure. The inout argument is connected to a
+    a signal that is declared outside the procedure to act as the storage.
+  
+When using in, out and inout arguments the difference between a procedure
+and an entity becomes small. The advantage of an entity is that it can have
+internal state (storage) and components are more intuitive for creating
+schematic structure and hierarchy. A prodedure can be synthesized, but
+typically procedures are more used in test benches and entities are more used
+in synthesizable code.
+
+The signal connected to a procedure output can be an array(I) element
+provided that the index I comes from a GENERATE statement. If the index I
+comes from a LOOP statement then the compiler gives Error: "(vcom-1450)
+Actual (indexed name) for formal "output argument" is not a static signal
+name"
+
+For many examples of reusable procedures and some functions see:
+  - tb_common_pkg.vhd
+  - tb_dp_pkg.vhd
+
+
+
+18) Test benches
+
+A device under test (DUT) can be a component, a module or a complete VHDL
+design that will run on the FPGA. In general all components, modules and
+designs should have a VHDL test bench to verify the working of the it.
+Consider that if you do not make a test bench for some component then you
+should not have made that component. If it is not worth testing it is not worth
+implementing. This rule can help development to focus on the functionality that
+is realy essential for the design. 
+
+Test benches are equally important as the design itself. Therefore the test
+bench code should also be well structured.
+
+Test bench code does not get into the FPGA, so therefore it is kept in a
+separate directory together with behavioral VHDL models of some peripherals
+(like e.g. an ADC or an I2C slave).
+
+The testing of a DUT in simulation is called 'verification'. The testing of a
+DUT on FPGA hardware is called 'validation'.
+
+a) Verifying a DUT
+   Test benches provide the envrionment in VHDL to verify a device under test
+   (DUT). The DUT can be anything between low component or an entire system.
+   The test bench environment consists of:
+   
+   - Stimuli that drive inputs of the DUT (e.g. clk, data)
+   - The DUT
+   - Behavioral models that model a external device (e.g. a sensor, an ADC)
+   - Verification that check outputs of the DUT
+   
+   The verification can be:
+   
+   - Monitoring (e.g. manually observing a signal in the Wave window)
+   - Logging to a file that can be check by means of a diff with a golden 
+     reference file.
+   - Self checking (comparing the DUT output with pre-calculated reference 
+     data or with stored golden reference data)
+      
+b) Test bench interface packages and components
+   The following packages provide useful procudures for making test benches:
+   
+   - tb_common_pkg     - General flow
+   - tb_common_mem_pkg - MM interfacing
+   - tb_dp_pkg         - DP interfacing
+   
+   The tst/ module provides file IO.
+   
+c) Multi-level test bench
+   A test bench can have generics that allow the multiple variations of the
+   test bench to be instantiated in a higher level test bench. This multi-test
+   bench can then simulate these variations in parallel and serve as a
+   regression test for the DUT.
+   These DUT regresssion test benches can again be instantiated in yet another
+   test bench to have a system level regression test bench. 
+   
+   For self stopping test benches the tb_end signal needs to be output via the
+   test bench entity, such that the multi-level test bench can issue its tb_end
+   when all tb instances have raised their tb_end.
+
+d) Self checking and self stopping
+   The preferred scheme is a self checking and self stopping test bench that
+   reports errors if they occur and runs as long as necessary. This is useful
+   when the test bench is ran manually and moreover it prepares the test bench
+   to be used in a regression test.
+
+   Errors can be reported via ASSERT or REPORT with SEVERITY ERROR.
+   
+   A simulation of a test bench can be stopped automatically be stopping all
+   toggling when the stimuli have finished. Typically all toggling can be 
+   stopped by making tb_end <= '1' and using tb_end in the clock statement, eg:
+   
+     clk <= NOT clk OR tb_end AFTER clk_period/2;
+   
+   For some IP stopping the clocks and applying the resets using tb_end is not
+   enough to stop the simulation, apparently because some process still keeps
+   on free running (eg. a VCO in a PLL). In those cases the simulation can be
+   forced to stop by asserting a FAILURE:
+   
+     REPORT "Tb simulation finished." SEVERITY FAILURE;
+   
+   The disadvantage of this scheme is that it can be confusing that the test
+   went OK but finishes with a failure. A better solution is therefore to
+   still tb_end <='1' to signal the end of the simulation, but use the Modelsim
+   'when' command to actually stop the simulation. The when command can be
+   issued only when a simulation is loaded:
+   
+     when -label tb_end {tb_end == '1'} {echo "Tb simulation finished" ; stop ;}
+   
+e) Python test case using MM interface
+   For DUT with an MM interface the stimuli and verification can be done using
+   a Python test case. The Python test case then communicates with the DUT
+   in Modelsim via file IO.
+   
+   The Python test case can stop the simulation by raising tb_end via a
+   dedicated file IO handler, eg similar as used for polling the simulation
+   time.
+   
+   
+   
+19) DP streaming component development example
+
+a) DP component example: dp_packet_merge.vhd
+   The description in dp_packet_merge.vhd explains how a DP streaming component
+   with flow control can be developed. It also explains how to do this within
+   the Gaisler two-process method.
+b) DP test bench example: tb_dp_example_no_dut.vhd
+   The tb_dp_example_no_dut.vhd provides an example test bench without DUT that
+   describes and shows how a streaming DP component can be verified using
+   stimuli procedures and verification procedures from tb_dp_pkg.
+    
+   
+   
+20) Simulation and synthesis debugging
+
+a) Compile errors
+Most compile errors are easily fixed. Fix the first 1 or 2 errors first and
+then recompile, because some more errors may then disappear as well and new
+error may appear. Sometimes it is more difficult to identify the cause of a
+compile error. Then first carefully read the error message and eg. use Modelsim
+'verror <error message number>' and/or Google (part) of the error message
+to get more info. If the cause still is notclear then try to isolate the cause
+by commenting out more and more parts of the code until the error disappears.
+
+b) Simulation error
+Make sure that the simulation has loaded the implementations of all components
+(eg. there are no empty black boxes and all memory initialization files have
+been found). Most simulation errors can be debugged by tracing the signals in
+the Wave Window and through the source code. It is almost never necessary use
+break points and to step through the VHDL code.
+
+c) General debugging
+- Try to consider a bug as a valuable insight into how the design is (not)
+  working. When the bug is fixed the design will be better then it was before.
+- Discuss the debug steps with one or more colleges to get fresh ideas and to
+  avoid misconceptions and blind spots
+- Narrow down the problem till it disappears and then build up the design again
+  till it reappears. 
+- Make step by step changes, that are sufficiently small to isolate the
+  problem.
+- Compare the design that fails with a working design (eg. a previous version)
+  and identify the differences.
+- In case of IP related bugs carefully read the manual (again), search the web
+  for reports on how to tackle IP problems
+- Stay calm and carefully observe the results of each debug step to determine
+  the best next step
+- A bug that takes longer than expected to solve can feel like a burden (eg.
+  due to project time pressure, due to lack of progress, due to incompetence),
+  be open about this when discussing this with collegues ('sof' = share our
+  feelings).
+    
+d) Hardware debugging
+A device under test (DUT) can be simulated using a VHDL test bench. With proper
+models of the DUT environment the simulation of DUT should use the same result
+as when running the synthesized DUT on FPGA hardware. If there does occur a
+mismatch between simulation and hardware behaviour then check the following:
+
+- In case of a new board:
+  . are the chips mounted correctly and are the chip type numbers correct
+  . are the values of e.g. termination resistors and capacitors correct
+- In case of a new FPGA:
+  . First try all FPGA interfaces using Xilinx/Altera reference designs that 
+    have been shown to work on development boards that use the same FPGA
+  . Ask on-site help of an Xilinx/ALtera engineer to speed up learning curve
+    on debugging the FPGA IO (especially transceivers and DDR memory).
+  . is the FPGA type used for synthesis correct?
+  . do you use the latest version of the synthesis tools, because with new
+    FPGAs there may be important tool updates
+- Is the still board oke, ie is eg unb_minimal still working? Are the board
+  power supplies oke and if nescessary are external clock and PPS connected?
+- If the design did work and now does not anymore recall what source files were
+  modified.
+- If an old design still worked then use a clean SVN check out and a clean
+  build to go back to this design. If this design works then go to newer
+  version in SVN and try that on hardware. Continue this kind of binary search
+  until the SVN version that works and the next that does not have been found.
+  Check the SVN log and the SVN diff of the file(s) that differ to determine
+  how these diff can cause the malfunctioning.
+- Be precise in noticing what exactly is not working without jumping to
+  conclusions too soon. First observe and analyse the bug to obtain as much 
+  clues as possible.
+- Is the environment model sufficiently accurate? If feasible try to improve
+  the model.
+- If the simulation works but on hardware it fails, then check that all
+  generics and derived constants that are different on hardware have scaled
+  accordingly. Maybe a generic or derived constant is hardcode and keeps its
+  simulation value, which is then wrong for hardware.
+  generic is hardcoded and keeps its model value instead
+- Check all warnings in the synthesis and fitter reports.
+- Does the synthesized DUT pass the timing constraints? If not then first fix
+  this.
+- Is the resource usage per instance as expected or is there functionality
+  that gets optimized away unitentionally? --> search for 'away' in the
+  synthesis report.
+- Is the pin definitions file correct and complete?
+- Are the RAM initializations files found by the synthesis tool? --> eg. for
+  the Nios program, a BG, the coefficients of a FIR filter
+- Are the constraint files found and interpreted correct by the synthesis tool? 
+- Does the RTL code have special VHDL constructions that migth be interpreted
+  differently by the synthesis tool (e.g. comparing std_logic_vectors of
+  different length --> better compare them as unsigned values)?
+- Are the process sensitivity lists complete (no used inputs missing)? -->
+  search for 'sensitivity' in the synthesis report.
+- Are there no latches in the design? -->  search for 'latch' in the synthesis
+  report.
+- There should not be any combinational loops in the design, -->  search for 
+  'combinational loop' in the fitter report.
+- Are there uninitialized signal values that may be interpreted differently in
+  synthesis? --> eg. use the c_dp_siso_rst/rdy/hold/flush constants to default
+  assign a record signal rather than assigning each record field individually
+  to avoid missing a field.
+- Are the RAM block initializations the same for both the simulation and
+  synthesis?
+- Are the correct clocks and resets connected to the different clock domains?
+- In case of occasional failure on hardware maybe a clock domain crossing is
+  not done properly.
+- Are the signal (pulses) and busses transferred properly accross clock
+  domains?
+- Does the (software) functionality take account of the latency that it takes
+  to let a signal reliably cross a clock domain?
+- For DDR3 and gigabit transceiver IP there may be an IP toolkit option to
+  observe IP internal status via JTAG
+- For bugs that take weeks to solve consider asking help from an external
+  (Altera or Xilinx) engineer that knows the IP.
+  
+If this all does not help to identify the cause then:
+- Add some more DUT signals to a monitor MM register that can be accessed via
+  the 1GbE control interface
+- Do a functional simulation of the post synthesis netlist of the design (ie
+  without the timing, but with the FPGA logic) to verify whether synthese has
+  interpreted the RTL the same as the simulator.
+- Use e.g. Altera SignalTap or Xilinx ChipScope to add an embedded logic
+  analyser to the DUT that can be accessed via JTAG.
-- 
GitLab