Skip to content
GitLab
Explore
Sign in
Register
Primary navigation
Search or go to…
Project
T
Tensor-Core Correlator
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Iterations
Wiki
Requirements
Jira
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Locked files
Build
Pipelines
Jobs
Pipeline schedules
Test cases
Artifacts
Deploy
Releases
Package registry
Container registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Code review analytics
Issue analytics
Insights
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
GitLab community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
ResearchAndDevelopment
Tensor-Core Correlator
Repository graph
Repository graph
You can move around the graph by using the arrow keys.
cmake-build
Select Git revision
Selected
cmake-build
Branches
3
master
default
amd-support
experimental-bulk-copies
4 results
Begin with the selected commit
Created with Raphaël 2.2.0
16
Jul
2
1
24
Jun
13
May
29
Apr
18
15
14
27
Mar
8
Jan
9
Dec
31
Oct
30
28
21
18
17
16
15
11
28
Jun
26
19
29
May
10
Apr
27
Feb
12
Jan
23
Dec
13
Oct
6
21
Sep
10
Aug
9
2
12
Jul
29
Jun
20
May
24
Apr
22
Sep
5
Jul
29
Jun
30
May
27
16
Mar
11
Feb
28
Jan
15
Dec
22
Nov
27
Oct
1
Sep
31
Aug
30
27
Fixed fp8 conj_perm()
experimental-bu…
experimental-bulk-copies
Run ptx assmbler right after compilation.
master
master
Temporarily revert to cudawrappers 0.9.0.
Merge branch 'use-cudawrappers-0.9.0' into 'master'
Use cudawrappers 0.9.0
Slightly faster implemention of conj_perm on GH200.
Also use int2 loads for fp16 (matrix entries changed in K order)
Reordered A and B matrix along K axis to optimize memory accesses.
Fixed e4m3/e5m2 support. Matrix A instead of B fixes the complex numbers.
Removed some implicit assumptions that NR_RECEIVERS_PER_TCM_Y equals 8.
More efficient B matrix ordering for fp16 and i8; breaks fp8 and i4.
Added i8.
Merge branch 'fp8' into 'master'
Fp8
Initial experiments. Works only for e4m3 and i4.
Added e4m3 benchmark.
Fixed module environment.
Updated for e4m3/e5m2 support.
Adapted tests to the new input format argument.
Adapted "usage" line.
Added e5m2 support.
Added e4m3 support.
Added support for sm_120 (consumer-grade Blackwell)
Provide enough parallelism for benchmarking.
Merge branch 'fix-pmt' into 'master'
Update to new PMT::Create interface
storeVisibility() now has recvX, recvY as arguments, instead of baseline number.
Add HIP launch bounds
amd-support
amd-support
Remove temporary syncthreads
Implement AMD visibility store with warp shuffle such that one thread can store full visilibities. Code to be simplified/cleaned up
Implement direct write of visibilities from registers on AMD GPUs
Revert "Cleanup"
Merge branch 'amd-support' of git.astron.nl:RD/tensor-core-correlator into amd-support
Cleanup
Fix PMT ROCm support
AMD arch check only on device compile
Add test with more channels
Cleaner fix for wave64
Make correlator tests fail when verification fails
Fix threadIdx-based offset calculation for wave64 in 8/16bit mode
Loading