Skip to content
Snippets Groups Projects
Commit 51819c68 authored by David Brouwer's avatar David Brouwer
Browse files

The README is fully adapted for the Agilex7 and the most important results of...

The README is fully adapted for the Agilex7 and the most important results of the synthesis are reported.
parent 65afdd30
No related branches found
No related tags found
1 merge request!363Porting ram for Intel Agilex 7
README.txt for $HDL_WORK/libraries/technology/ip_agi027_xxxx/ram
VERSION 01 - 20231110
Contents:
1) RAM components
2) ROM components
3) Agilex7 IP
4) Inferred IP
5) Memory initialization file
6) Implementation options (LUTs or block RAM)
7) Synthesis trials
8) Agilex7 issues
9) References
1) RAM components:
Available:
ip_agi027_xxxx_ram_cr_cw = One read port with clock and one write port with clock and with separate address and same data width on both ports.
ip_agi027_xxxx_ram_crk_cw = One read port with clock and one write port with clock and with separate address and different data withs on both ports.
The data port widths maintain a power of two ratio between them.
ip_agi027_xxxx_ram_r_w = Single clock, one read port and one write port and with separate address and same data width on both ports.
ip_agi027_xxxx_ram_rw_rw = Two read/write ports each port with same clock and with separate address per port and same data width on both ports.
Unavailable:
ip_agi027_xxxx_ram_crw_crw = Two read/write ports each port with own port clock and with separate address and same data width on both ports.
For the Agilex 7 this IP can only be generated with 'Emulate TDP dual clock mode' and what this entails is described
under '8) Agilex7 issues'. With this mandatory enable option, this IP is not supported as used for previous technologies.
ip_agi027_xxxx_ram_crwk_crw = Two read/write ports each port with own port clock and with power of two ratio between port widths.
Not available, because the Agilex 7 does not support ratio widths in combination with true dual port mode.
2) ROM components:
ip_agi027_xxxx_rom_r_w = Not available and not needed, because the ip_agi027_xxxx_ram_r_w can be used for ROM IP by not connecting the
write port. The IP could be created and than the vhd file can be derived from the generated HDL files and the
existing ip_stratixiv_rom_r.vhd file.
3) Agilex7 IP
The RAM IPs were ported manually from Quartus v19.4 for arria10_e2sg to Quartus 23.2 for agi027_xxxx by creating it in Quartus
using the same parameter settings by:
- methode A:
. copy original ip_arria_e2sg_<ram_name>.vhd and ip_arria_e2sg_<ram_name>.ip files.
. rename ip_arria_e2sg_<ram_name>.ip and .vhd into ip_agi027_xxxx_<ram_name>.ip and .vhd (also replace name inside the .vhd file)
. open in to Quartus 23.2.0 build 94, set device family to Agilex7 and device part to AGIB027R31A1I1VB.
Finish automatically convert to "new" IP, note differences such as version.
. then generate HDL (select VHDL for both sim and synth) using the Quartus tool or generate HDL in the build directory using the
terminal command generate_ip_libs <buildset> and finish to save the changes.
. compare the generated files to the copied .vhd file for version, using the same library, generics, and ports. Make adjustments if
necessary to make it work.
. git commit also the ip_agi027_xxxx_<ram_name>.ip to preserve the original in case it needs to be modified.
- methode B:
. copy original ip_arria_e2sg_<ram_name>.vhd file.
. rename ip_arria_e2sg_<ram_name>.vhd into ip_agi027_xxxx_<ram_name>.vhd (also replace name inside the .vhd file).
. open ip_arria_e2sg_<ram_name>.ip file in Quartus 19.4.0 build 64. No device family and device part need to be set.
. open also Quartus 23.2.0 build 94, set device family to Agilex7 and device part to AGIB027R31A1I1VB.
. select the corresponding IP in the IP catalog in Quartus 23.2.0 and provide the filename as ip_agi027_xxxx_<ram_name>.ip
Finish automatically convert to IP, note differences such as version.
. save the changes and then generate HDL (select VHDL for both sim and synth) using the Quartus tool or generate HDL in the build
directory using the terminal command generate_ip_libs <buildset> to finish it.
. compare the generated files to the copied .vhd file for version, using the same library, generics, and ports. Make adjustments if
necessary to make it work.
. git commit also the ip_agi027_xxxx_<ram_name>.ip to preserve the original if case it needs to be modified.
this yields:
ip_agi027_xxxx_ram_cr_cw.ip
ip_agi027_xxxx_ram_crk_cw.ip
is derived from the ip_arria10_e2sg_ram_crwk_crw by modifying it to feature a single read and a single write port,
and incorporating a dual-clock design with distinct clocks for reading and writing.
ip_agi027_xxxx_ram_r_w.ip
ip_agi027_xxxx_ram_rw_rw.ip
is derived from the ip_arria10_e2sg_ram_crw_crw, incorporating the modification to operate with a single clock.
The IP only needs to be generated with generate_ip_libs <buildset> if it need to be modified, because the ip_agi027_xxxx_ram_*.vhd
directly instantiates the altera_syncram component. The buildset for the agi027_xxxx is iwave.
The instantiation is copied manually from the ip_agi027_xxxx_ram_*/ram_2port_2040/sim/ip_agi027_xxxx_ram_*.vhd and saved in the
ip_agi027_xxxx_<ram_name>.vhd file. So the generated HDL files are no longer needed, because it could easily be derived
from the IP file and the files will be generated in the build directory (under iwave/qsys-generate/) when using the terminal command
generate_ip_libs <buildset>.
It appears that the altera_syncram component can be synthesized even though it comes from the altera_lnsim package,
that is a simulation package. However it resembles how it worked for Stratix IV with altera_mf.
4) Inferred IP
The inferred Altera code was obtained using template insert with Quartus 14.0a10. The IPs with different port widths,
like the ram_crk_cw, can not be inferred from RTL code.
For the RAM the g_inferred generic is set to FALSE because the inferred instances do not yet support g_init_file.
It is possible to init the RAM using a function e.g. see the README.txt for arria10. But this is probably not being
applied (for now) because it's easier to generate an IP and use the altera_syncram component. The inferred ones
require more effort to make them work, because the structure of the inferred Altera code should let Quartus know
to use a RAM block for implementation.
5) Memory initialization file
Often referred to as a .mif file. It is used to initialize the content of memory blocks within the design, specifying
the data to be stored in each memory location. This file must be included in the Quartus Projects. During synthesis,
the tool uses this file to iniliaze the memory blocks in the design.
To support the g_init_file requires first reading the file in a certain format, by providing a file path as a string,
which indicates the location of the file. This path is telative to the project folder in the build directory.
These files uses the Intel hex-standar and are word adressed (32 bits per address). For us an integer format or SLV
format with one value per line (line number = address) would be fine. Using SLV format is necessary if the RAM data
is wider than 32 bit, because VHDL integer range is only 2**32. The tb_common_pkg has functions to read such a file.
Previously Quartus created a mif file from this when it infers the RAM. However the UniBoard1 designs provided a mif
file that fits the RAM IP. Therefore it was easier initially to also use the RAM IP for Arria10, and this still holds
on, also for the Agilex7. For RadioHDL a generic RAM init file format is preferrable though. Currently the args tooling
with the command gen_rom_mmap.py (refer to [8]) is used to generate the register map as a text file, compresses it, and
then creates the corresponding .hex or .mif file from it. For other RAM initialization we generate a hex file with
Python or with the Memory Initialization Tool that Quartus Prime GUI provides ourselves. This tool allows you to
specify the initial contents of memories in your design visually.
6) Implementation options (LUTs or block RAM)
The IP (and also the inferred) RAM can be set to use LUTs (MLAB), block RAM (M20K) or LCs, however this is not supported yet.
. For IP RAM this would imply adding a generic to set the appropriate parameter in the altera_syncram
. For inferred RAM this would imply adding a generic to be used for the syntype attribute.
For an example see the README.txt for arria10.
7) Synthesis trials
All the synth .vhd files have been simulated and performed well.
The quartus/ram.qsf could be derived from the ip_arria10/ram/ folder and changed to only the following assignments:
set_global_assignment -name FAMILY "Agilex 7"
set_global_assignment -name DEVICE AGIB027R31A1I1VB
set_global_assignment -name LAST_QUARTUS_VERSION "23.2.0 Pro Edition"
set_global_assignment -name ERROR_CHECK_FREQUENCY_DIVISOR 256
set_global_assignment -name MIN_CORE_JUNCTION_TEMP "-40"
set_global_assignment -name MAX_CORE_JUNCTION_TEMP 100
set_global_assignment -name PWRMGT_VOLTAGE_OUTPUT_FORMAT "LINEAR FORMAT"
set_global_assignment -name PWRMGT_LINEAR_FORMAT_N "-12"
set_global_assignment -name POWER_APPLY_THERMAL_MARGIN ADDITIONAL
quartus_qsf_files = $HDL_WORK/libraries/technology/ip_agi027_xxxx/ram/quartus/ram.qsf could be added to the hdllib.cfg under
[quartus_project_file]. Use the terminal command quartus_config <buildset> to create/update all the projectfiles for iwave.
The Quartus project ip_agi027_xxxx_ram.qpf from $HDL_BUILD_DIR/iwave/quartus/ip_agi027_xxxx_ram/ was used to verify that the block RAM IP
actually synthesise to the appropriate FPGA resources. The current version of the inferred RAM is verified at arria10. Use the Quartus
GUI to manually select a top level component for synthesis e.g. by right clicking the entity vhd file in the file tab of the Quartus
project navigator window. For the (default) testcondition the generics are set to 32 words memory size and 8 bits wide. They only differ
for crk_cw waarbij the generics are set to 32 words memory size for writing, 32 bits wide of each write port, 16 words memory size for
reading and 64 bits wide of each write port. Then check the resource usage in the synthesis and fitter reports.
The most important information from these reports is (found under Place Stage > Resource Usage Summary / Resource Utilazation by Entity):
. for g_nof_words equal to 32 and for g_dat_w equal to 8:
. one M20k block ram is used, but it is not completely filled. 8 * 32 = 256 block memory bits.
. no M20k block ram is used. Instead, 256 MLAB memory bits are used along with combinational ALUT usage and 8 memory ALUT usage.
. for g_nof_words equal to 1024 and for g_dat_w equal to 20, exactly one M20k block ram is used and filled completely.
20 * 1024 = 20480 block memory bits.
. for g_wr_nof_words equal to 32, g_wr_dat_w equal to 32, g_rd_nof_words equal to 16, and g_rd_dat_w equal to 64, two M20k block RAMs are
used, but they are not completely filled. Only 1024 block memory bits are used. 32 * 32 = 1024 block memory bits. A reasonable explanation
for this is that the data width is greater than 40 bits, which is the maximum data width with this memory size for one block ram. [2]
. the total M20k blocks is 13272. Thus the total block memory bits that is available is 13272 * 20480 = 271810560 when optimal use.
. no dsp blocks are used.
. the total dsp blocks on the device is 8528.
. the dedicated logic registers are 5 of the primary type.
. the total logic registers are 1825600 for each type.
. the used LABs is 5 (4 logic/1 memory).
. the total LABs on device is 91280.
. no ALMs needed for cr_cw, crk_cw and rw_rw.
. the ALMs needed [=A-B+C] for r_w is 13.
. the total ALMs on device is 912800.
. due to a critical warning that occured during synthesis of cr_cw (refer to [4]), it was identified that the issue arises when it uses dual
clock in conjunction with the read_during_write_mode_mixed_ports => "OLD_DATA". According to altera_syncram user guide, this configuration
is only supported when the same clock is utilized. Currently, the parameter value is set to "OLD_DATA", because when this parameter it is
set to "DONT_CARE" for the ip_arria10_e2sg_ram_crw_crw this eliminates the warning, but the regression test then fails. Implementing this
correctly across all technologies requires additonal effort. It is possible that this configuration may be applied in the future.
. due to the same parameter an error occurs for rw_rw. As a result the parameter value is now set to DONT_CARE in stead of OLD_DATA to resolve
the error. [3]
8) Agilex7 issues
No (direct) available use of ip_agi027_xxxx_ram_crw_crw and *_crwk_crw. The other .vhd synth files based on generated HDL files of the IPs did not
encounter any issues.
crw_crw (dual-clock-read-write port RAM):
-Cause:
Due to the error that occurs in the Quartus configuration (refer to [5]), the parameter "emulate TDP dual clock mode" needs to be enabled.
As a result, this synthesis file cannot easily be ported. While the file can be successfully configured, it cannot be used differently without
a significant latency. This limitation arises because the VHDL synthesis code of this IP must utilize the TDP dual clock emulator, which consists
of two DCFIFOs and a single RAM block. However, it is preferable to resolve this issue at a higher layer where the implementation occurs.
-Explanation:
Nevertheless according to the user manual of the Agilex 7, when you engage the TDP dual clock emulator feature (refer to [1]):
. the clock connection to port A must be a slow clock (clock A).
. the clock connection to port B must be a fast clock (clock B).
. the clock frequency ratio of clock B divided by clock A is greater than or equal to seven.
. port A and port B will have different latency, it can only be used with a minimum latency of five clock cycles (of clock A), which is significant.
. the latency for port A decreases as the difference between the two clock frequencies increase.
. the latency for port b is fixed to two clock cycles and the output registers are enabled for this configuration.
. the FIFO addresses clock domain crossing (CDC) issues for the control signals and serves as a temporary buffer for storing data before and after
being processing by the RAM block.
. the FIFO depth can be adjusted with the use of a generic.
. the FIFO depth must be a power of 2 and must exceed the clock frequency ratio (B/A) to ensure the proper functioning of the emulated TDP.
-Solution:
This results in the utilization of a newly created IP, ip_agi027_xxxx_ram_rw_rw, which is a single-clock dual-read-write RAM, instead of *_crw_crw. And
address the solution at the higher-level layers where the implementation is occurring. This is appropriate due to the structure of the HDL git repository.
For this new IP, tech_memory_ram_rw_rw is created, wherein rw_rw functionality is constructed for the previous technology identifiers using the crw_crw
IP synthesis files in only one clock domain by providing the same clock signal twice, and no new rw_rw IPs need to be generated.
The 'common_ram_rw_rw' and 'common_paged_ram_rw_rw' files had to be modified to facilitate the integration of this new RAM IP. Additionally, an extra
testbench is created to simulate the "paged" file by duplicating the '*_crw_crw' version. This adjustment was necessary because previously, the
'common_(paged_)_ram_crw_crw' files were underlying utilized by the 'rw_rw' files, and the usage has now been shifted to these files.
crwk_crw (dual-clock-read-write port with a power of two data width ratio):
-Cause:
Due to the errors that occurs in the Quartus configuration (refer to [5], [6] and [7]), the ip_agi027_xxxx_ram_crwk_crw cannot be ported.
This IP has also the same issue due to the clocking method as crw_crw, but also has additonal issues due to incompatibility for different data withs
for true dual port RAM.
-Solution:
To facilitate a specific aspect of the functionality provided by crwk_crw, specifically its integration into common_ram_cr_cw_ratio, a newly IP,
ip_agi027_xxxx_crk_cw is created instead of *_crwk_crw. Which is a dual-clock simple-dual-read-write RAM. Unfortunately, there is no built-in
implementation or solution for achieving the same functionality as crwk_crw for backward compatibility with Arria10 in the Quartus tool.
This implies that a custom implementation must be created at higher-level layers to achieve this functionality.
For this new IP, tech_memory_ram_crk_cw is created, wherein crk_cw functionality is made compatible for the existing technology identifiers using the crwk_crw
IP synthesis files, by utilizing only the read port for one clock domain and only the write port for the other, eliminating the need to generate new rw_rw IPs.
The 'common_ram_cr_cw_ratio' file had to be modified to facilitate the integration of this new RAM IP. No additional testbench is created for simulation,
as there is also no testbench for the underlying 'common_ram_crw_crw_ratio' file that was utilized.
9) References:
[1] https://www.intel.com/content/www/us/en/docs/programmable/683241/23-2/true-dual-port-dual-clock-emulator.html
[2] https://www.intel.com/content/www/us/en/docs/programmable/683241/23-2/embedded-memory-configurations.html
[3] https://www.intel.com/content/www/us/en/docs/programmable/683241/23-2/mixed-port-read-during-write-mode.html
[4] Critical Warning(15003): "mixed_port_feed_through_mode" parameter of RAM atom gen_ip.u_altera_syncram|auto_generated|altera_syncram_impl1|ram_block2a5 cannot have value "old" when different read and write clocks are used.
[5] Error: In 'Clks/Rd, Byte En' tab. 'Emulate TDP dual clock mode' must be enabled if clocking method is 'Customize clocks for A and B ports' for Agilex 7 while using two read/write ports.
[6] Error: In 'Widths/Blk Type' tab, the valid ratio between widths of port A and port B is 1 for device family Agilex 7 while using two read/write ports.
[7] Error: In 'Widths/Blk Type' tab. 'Use different data widths on different ports' feature cannot be enabled as the valid ratio between port A and B must be 1 for Agilex 7 while using two read/write ports.
[8] ARGS tool script to generate fpgamap.py M&C Python client include file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment