idg-bin/tests/python/cuda-unified.py seg faults during FFT
cuda-generic.py
runs fine. Python 3.8.5, CUDA 11.2. Bram was able to reproduced it last Friday.
$ python cuda-unified.py
Error importing OpenCL: ('/opt/lib/libidg-opencl.so: cannot open shared object file: No such file or directory',)
>> Dataset full:
number of stations: 52
number of baselines: 1326
longest baseline = 1980.4 km
maximum grid size: 3772986368
longest baseline required: 29.12 km
>> Dataset limited to baseline up to 29.12 km:
number of stations: 52
number of baselines: 530
longest baseline = 28.6078 km
>> Dataset limited to 190 baselines:
number of stations: 52
number of baselines: 190
longest baseline = 28.6078 km
CUDA::default_info
Searching for source files in: /opt/lib/idg-cuda
Temporary files will be stored in: /tmp/idg-0mn9Wf
CUDA::CUDA
InstanceCUDA
set_parameters
compile_kernels
Searching for source files in: /opt/lib/idg-cuda
Temporary files will be stored in: /tmp/idg-gTmQyo
Compiling /tmp/idg-0mn9Wf/Splitter.cubin
/usr/local/cuda/bin/nvcc -cubin -use_fast_math -G -src-in-ptx -arch=sm_86 -DNR_POLARIZATIONS=4 -I/opt/include -DTILE_SIZE_GRID=128 -o /tmp/idg-0mn9Wf/Splitter.cubin /opt/lib/idg-cuda/KernelSplitter.cu
Compiling /tmp/idg-0mn9Wf/Calibrate.cubin
Compiling /tmp/idg-0mn9Wf/KernelFFTShift.cubin
/usr/local/cuda/bin/nvcc -cubin -use_fast_math -G -src-in-ptx -arch=sm_86 -DNR_POLARIZATIONS=4 -I/opt/include -o /tmp/idg-0mn9Wf/KernelFFTShift.cubin /opt/lib/idg-cuda/KernelFFTShift.cu
/usr/local/cuda/bin/nvcc -cubin -use_fast_math -G -src-in-ptx -arch=sm_86 -DNR_POLARIZATIONS=4 -I/opt/include -o /tmp/idg-0mn9Wf/Calibrate.cubin /opt/lib/idg-cuda/KernelCalibrate.cu
Compiling /tmp/idg-0mn9Wf/AverageBeam.cubin
Compiling /tmp/idg-0mn9Wf/Scaler.cubin
Compiling /tmp/idg-0mn9Wf/Gridder.cubin/usr/local/cuda/bin/nvcc -cubin -use_fast_math -G -src-in-ptx -arch=sm_86 -DNR_POLARIZATIONS=4 -I/opt/include -o /tmp/idg-0mn9Wf/Scaler.cubin /opt/lib/idg-cuda/KernelScaler.cu
/usr/local/cuda/bin/nvcc -cubin -use_fast_math -G -src-in-ptx -arch=sm_86 -DNR_POLARIZATIONS=4 -I/opt/include -o /tmp/idg-0mn9Wf/AverageBeam.cubin /opt/lib/idg-cuda/KernelAverageBeam.cu
Compiling /tmp/idg-0mn9Wf/Adder.cubin
Compiling /usr/local/cuda/bin/nvcc -cubin -use_fast_math -G -src-in-ptx -arch=sm_86 -DNR_POLARIZATIONS=4 -I/opt/include -DTILE_SIZE_GRID=128 -o /tmp/idg-0mn9Wf/Adder.cubin /opt/lib/idg-cuda/KernelAdder.cu
/usr/local/cuda/bin/nvcc -cubin -use_fast_math -G -src-in-ptx -arch=sm_86 -DNR_POLARIZATIONS=4 -I/opt/include -DBATCH_SIZE=128 -o /tmp/idg-0mn9Wf/Gridder.cubin /opt/lib/idg-cuda/KernelGridder.cu
/tmp/idg-0mn9Wf/KernelWtiling.cubin
Compiling /tmp/idg-0mn9Wf/Degridder.cubin
/usr/local/cuda/bin/nvcc -cubin -use_fast_math -G -src-in-ptx -arch=sm_86 -DNR_POLARIZATIONS=4 -I/opt/include -DBATCH_SIZE=256 -o /tmp/idg-0mn9Wf/Degridder.cubin /opt/lib/idg-cuda/KernelDegridder.cu
/usr/local/cuda/bin/nvcc -cubin -use_fast_math -G -src-in-ptx -arch=sm_86 -DNR_POLARIZATIONS=4 -I/opt/include -o /tmp/idg-0mn9Wf/KernelWtiling.cubin /opt/lib/idg-cuda/KernelWtiling.cu
CUDA::initialize_buffers
CUDA::free_buffers
Devices:
GeForce RTX 3090
Device memory : 12241 Mb / 24268 Mb (free / total)
Shared memory : 48.00 Kb
Clk frequency : 1695 Ghz
Mem frequency : 9751 Ghz
Number of SM : 82
Mem bus width : 384 bit
Mem bandwidth : 936 GB/s
Number of threads : 1536
Capability : 86
Unified memory : 1
Compiler flags:
-use_fast_math -G -src-in-ptx -arch=sm_86 -DNR_POLARIZATIONS=4 -I/opt/include
Generic::Generic
Unified::Unified
nr_stations = 20
nr_baselines = 190
nr_channels = 1
nr_timesteps = 7200
nr_timeslots = 16
nr_correlations = 4
subgrid_size = 24
grid_size = 2048
image_size = 0.0592
kernel_size = 13
integration_time = 0.9
Plan::Plan
Plan::initialize
kernel_size : 13
subgrid_size : 24
grid_size : 2048
nr_baselines : 190 (input)
nr_timesteps : 7200 (per baseline)
nr_channels : 1 (per baseline)
nr_visibilities : 1368000 (planned)
nr_subgrids : 1827 (planned)
Unified::do_gridding
### Initialize gridding
CUDA::initialize
CUDA::compute_jobsize
CUDA::cleanup
CUDA::initialize_buffers
CUDA::free_buffers
nr_stations = 20
nr_timeslots = 16
nr_timesteps = 7200
nr_channels = 1
subgrid_size = 24
nr_baselines = 190
max_jobsize = 0
Bytes required for static data: 11446276
Bytes required for job data: 1742880
Bytes free: 12836208640
Bytes reserved: 5134483456
Jobsize: 190
### Run gridding
Generic::run_gridding
CUDA::do_transform
Segmentation fault (core dumped)