erko_firmware_overview.txt

Author: Eric Kooistra, jan 2018
Title: Key aspects of FPGA firmware devlopment at RTSD

Purpose:
- Provide a list of key aspects of FPGA firmware devlopment at RTSD
- Identify libraries or toolscript that we could isolate and make public via e.g. OpenCores or GitHub
- Identify topics that we need to focus on in the future


1) Develop FPGA hardware boards
  - Review board design document and schematic and layout, so that the board will not contain major bugs and
    so that firmware engineers can already learn about the board and get familiar with it
  - Pinning design to verify schematic
  - Vendor reference designs to verify the IO
  - Heater design to verify the cooling and the power supplies
  
2) Technology independent FPGA:
  - Wrap IP (IO, DSP, memory, PLL)
  - Xilinx (LOFAR, SKA CSP Low)
  - Altera (Aartfaac, Apertif, Arts)
  
3) VHDL design:
  - Clean coding
  - Reuse through HDL libraries
  - Standard interfaces: MM & ST, support Avalon, AXI using VHDL records mosi/miso, sosi/siso
  - Use records not only for signals but also for generics, because adding a record field does not change the 
    component interface.
  - Distinguish beteen state registers and pipeline registers.
    . For example: dp_block_resize.vhd, dp_counter.vhd.
  - Board minimal design that provides control access to the FPGA board and the board control functions
  - Board test design that contain the minimal design plus interfaces to use the board IO (transceivers, DDR)
  - Build FPGA application design upon a board minimal design and the relevant IO from the board test design
  - Useful libraries and packages:
    . base: common, dp, mm, diag, reorder, uth
    . dsp: wpfb, bf, correlator, st
    . io: eth, io_ddr, i2c
  - Design for scaleability with generics that can be scaled over the logical range, e.g. >= 0, even if the
    application only requires a certain fixed value. The reasons are:
    . During development the application typically starts small (e.g. a BF with 4 inputs) while the final
      application is much larger (e.g. a BF with 64 input). With generics both can be supported through a
      parameter change.
    . For simulation it is often necessary to reduce the size of the design to be able to simulate it in a
      reasonable time. By scaling it down via generics the design preserves its structure but becomes much
      smaller. 

3) VHDL testing:
  * Levels of application verification
    - use refrence designs to verify the vendor phy IO IP, in the application these are replaced by models.
      For example: tranceiver, DDR3, MM interface via MM file IO, ...
    - detailed unit tests per HDL library using entity IO to proved that the unit is correct in all relevant
      use cases and corner cases, usch that application tests can focus on integration tests.
    - integration top level or multi FPGA tests using MM file IO
      . MM file IO for testbenches at design level, 'breaking the hierarchy' in VHDL or providing access to Modelsim simulation with Python
      . preferrably use MM file IO and revisions of the top level design to verify parts in the top level design, rather then making
        a testbench for only that part using the IO of that part. The control interface should be enough to test the part, therefore 
        using MM file IO is enough and avoids testbenches that make use of other entity IO signals. Typically the revision can contain BG
        and DB (with MM interface) to also have direct streaming access to the part in the top level.
  * regard the firmware as a data computer, so independent of its functional (astronomical) use we need to verify and validate that for a 
    known stream of input data it outputs the expected output data.
  * Verification via simulation:
    . use of g_sim, g_sim_record to differentiate between simulation and hardware
      speed up MM clk, I2C clk, skip PHY startup time, reduce size while keeping the structure,
      skip or bypass functions
    . use g_design_name to differentiate between revisions, e.g. to speed up simulation or synthesis
    . behavoral models of external IO (DDR, Transceivers, ADC, I2C)
    . break up data path using WG, BG, DB, data force
    . optional use of transparant DSP models to pass on indices.
    . verify data move by transporting meta data (indices) via the sosi data fields
    . profiler to know time consuming parts
  * VHDL regression test (if not there, then it is not used)
  * Validation on hardware
    . using Python peripherals for MM control using --cmd options per peripheral
    . construct more complicated control scripts using sequence of peripheral scripts and --cmd
    . we need proper data capture machines, to validata 10G, 40 GbE data output (e.g. using wireshark and some Python code)

4) RadioHDL
  - RadioHDL is our umbrella name for set of tool scripts that we use for firmware development, focus on implementation. RadioHDL makes
    it easier for developers to organize different versions and combinations of their firmware, tools and boards. RadioHDL is a platform?
  - The name RadioHDL covers HDL code for RadioAstronomy as a link to what we do at Astron. However by using only the word Radio we keep
    the name a bit more general, because in fact the RadioHDL tool scripts can be used for any (FPGA) HDL development. Outside Astron
    the word RadioHDL can be advertised as an HDL radio station that one likes to listen to, ie. to use, so a feel good name with a
    strong link to HDL but otherwise not explicitely telling what it is. The word RadioHDL also has no hits in Google search, so no
    conflict or confusion with others.
  - Automate implementation flow (source --> config file --> tool script --> product, a product can be the source of a next product)
  - Organize code in libraries using hdllib.cfg
  - Manage tool versions using hdltool_<toolset name>.cfg
  - Create project files for sim and synth
  - ARGS (Automatic Register Generation System using MM bus and MM register config files in yaml)
  - Create FPGA info (used to be called system info) address map stored in FPGA to allow dynamic definition of address maps. The definition
    of the MM register fields is kept in files because it typically remains fixed.
  - Easily enroll the environment on a new PC and introduce a new employee (to be done --> OpenCores, Ruud Overeem)
  
5) Oneclick
  - OneClick is our umbrella name for new ideas and design methods, focus on firmware specifcation and design. New tools that are
    created within OneClick may end up as part of the RadioHDL envirionment. This has happened for example with ARGS.
  - The name OneClick relates to our 'goal at the horizon' to get in one click from design to realisation.
  - Automate design flow
  - Array notation (can be used in document and in code --> aims for simulatable specification)
  - Modelling in python of data move, DSP and control

6) New hardware, tools and languages
  - FPGA, GPU, DSP, ASIC
  - OpenCL
  - HLS
  - Compaan, Clash, Wavecore
    
7) Documentation
  - Documentation is needed to specify what we have to make
    . Detailed design document uses array notation to cleary describe all internal and external interfaces
    . Detailed design document also identifies test logic that is needed for the integration top level tests
  - No need to document what we have made, except for readme file and manuals
  - The code is self explanatory (with comment in docstring style using purpose and description)
  - The project scripts identify what is relevant for a product
  - The regression tests identify what is relevant code (if it is not tested it is not important and should not have been made)
  - It would be nice to have YouTube movies that show our workflow and boards

8) Project planning
  - Wild ass guess based on time logs of previous projects
  - Agile style with backlog, scrum and 3 week sprints (If it is not an allocated epic/story/task in Redmine then it will not be done).
  - Review process:
    . purpose is to ensure value and quality and to spread knowledge and awareness
    . coder works based on a ticket in Rdemine, all production code must be reviewed by another team member
    . coder delivers code according to coding style, with purpose-description/docstring and with regression test
    . reviewer reviews code and function, reports via redmine ticket
    . reviewer only reports
    . coder does corrections and merges branch to trunk
  - Roles within the team
  - Definition of done 
  - Outsourcing
  - Hiring temporary consultants
  - What maintenance support do we provide after a project has finished
    . firmware tends to become hardware in time, ' het verstaft'
    . using virtual machines (dockers) to bundle a complete set of operating system, tools and code for the future or to
      export as a starting point to an external party (e.g. for outsourcing)
  - What if we would be with 10 - 15 digital/firmware engineers instead of about 5 as now (Gijs, Leon, Pieter, Jonathan, Eric, Daniel)
    
9) Ethernet networks
  - 1GbE, 10GbE, 40GbE, 100GbE IP
  - Knowledege of switches
  - Knowledege of UDP, IP, VLAN
  - Monitoring and Control protocol (UniBoard, Gemini)
  - Streaming data offload
  
10) Outreach, collaborations. recruiting
  - Oliscience opencores
  - NWO digital special interest group
  - student assignments
  
11) RTSD pillars
  - All data storage
  
  
12) Version control
  - We use SVN and work on the trunk. This is feasible because we are a small team and has the advantage that issues are noted
    in an early stage
  - Common practice in larger software development teams is that code is developed on branches and merged to the trunk after 
    it has been verified
  - In future use GIT?
  
  
13) FPGA - GPU
  - ASTRON_MEM_193_Comparison_FPGA_GPU_switch
  - FPGA are good at:
    . can interface to ADC (not possible with GPU, so always need for a glue logic FPGA, but such an FPGA with gigabit
      transceivers is also capable of quite some processing)
    . can support many external IO ports via upto ~100 transceivers (GPU only a few fast external IO ports)
    . reorder blocks of data, e.g. in a packet payload
    . low latency applications (e.g. low latency trading), fixed latency (e.g. fast control loops, absolute timing)
    . embedded, standalone applications
    . have life time/ support time of > 10 years (GPU < 5 years)
  - GPU are good at:
    . more general to program
    . fast compile times (< minutes, versus > hours for FPGA)
    . uses floating point arithmetic by default (versus fixed point by default for FPGA)
    . matrix operations