Skip to content
GitLab
Explore
Sign in
Register
Primary navigation
Search or go to…
Project
T
Tensor-Core Correlator
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Iterations
Wiki
Requirements
Jira
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Locked files
Build
Pipelines
Jobs
Pipeline schedules
Test cases
Artifacts
Deploy
Releases
Package registry
Container registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Code review analytics
Issue analytics
Insights
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
GitLab community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
ResearchAndDevelopment
Tensor-Core Correlator
Repository graph
Repository graph
You can move around the graph by using the arrow keys.
cmake-build
Select Git revision
Selected
cmake-build
Branches
4
master
default
amd-support
experimental-bulk-copies-backup
fix-opencl-test
5 results
Begin with the selected commit
Created with Raphaël 2.2.0
25
Aug
22
20
8
7
6
5
4
31
Jul
30
29
28
22
21
18
16
2
1
24
Jun
13
May
29
Apr
18
15
14
27
Mar
8
Jan
9
Dec
31
Oct
30
28
21
18
17
16
15
11
28
Jun
26
19
29
May
10
Apr
27
Feb
12
Jan
23
Dec
13
Oct
6
21
Sep
10
Aug
9
2
12
Jul
29
Jun
20
May
24
Apr
22
Sep
5
Jul
29
Jun
30
May
27
16
Mar
11
Feb
28
Jan
15
Dec
22
Nov
27
Oct
1
Sep
31
Aug
30
27
Register OpenCLCorrelatorTest with CTest
fix-opencl-test
fix-opencl-test
Fixed OpenCL test program.
Merge branch 'experimental-bulk-copies' into 'master'
master
master
Removed code in comments
Removed debug code.
Removed debug code.
Fixed approximates(), so that it now also catches sign errors.
Fixed fp8 conjugation bug.
Merge branch 'experimental-bulk-copies' of https://git.astron.nl/RD/tensor-core-correlator into experimental-bulk-copies
Added [[deprecated]] backward-compatible Correlator() constructor.
Removed white space.
Updated README.md for added fp4 support.
Removed pre-async-copies-era code.
No longer necessary to use an old cuda wrappers version.
More extensive and faster benchmarking.
Allow setting the number of thread blocks per SM.
Use std::optional<>. API change!
Print #registers (commented out).
Run fewer thread blocks / SM if not using async copies (due to register use).
Fixed default nrReceiversPerBlock,
Simplification.
Minot i4 optimization.
Changes int8 shape from m16n16k16 to m16n8k32.
Run ptxas only once.
Removed (broken) PORTABLE support.
Fixed Volta.
Fix !defined ASYNC_COPIES.
Fixed CP_ASYNC_BULK, but still disabled; it is slower than ASYNC_COPIES.
Simplified copyAsync().
Use CUDA_VERSION.
Sped up C++ verification code.
Added fp4 to benchmark.
Adapted usage() for fp4.
Fixes builds with CUDA 12.7 and older.
Added fp4 support.
API change: export nrTimesPerBlock.
Fixed i4.
Fixed fp8 conj_perm()
Slightly faster implemention of conj_perm on GH200.
Also use int2 loads for fp16 (matrix entries changed in K order)
Loading