As header-only libraries based on CUTLASS, these only work in C++.
It would be nice to have e.g. the new DGEMM via IMMA (https://github.com/NVIDIA/CUDALibrarySamples/tree/master/MathDx/cuBLASDx/16_dgemm_emulation) inside Julia kernels.
Unfortunately, Warp/numba can only JIT compile a subset of python types, and not the (more general) Julia types that we require (e.g. differential equation solvers).
A potential way forward would be to compile libmathdx to PTX, link it in via LLVM, and get full performance via LTO. However, these often contain NVVM IR, which LLVM cannot handle.
Related discussion: https://discourse.julialang.org/t/using-cublasdx-in-julia/125527
As header-only libraries based on CUTLASS, these only work in C++.
It would be nice to have e.g. the new DGEMM via IMMA (https://github.com/NVIDIA/CUDALibrarySamples/tree/master/MathDx/cuBLASDx/16_dgemm_emulation) inside Julia kernels.
Unfortunately, Warp/numba can only JIT compile a subset of python types, and not the (more general) Julia types that we require (e.g. differential equation solvers).
A potential way forward would be to compile libmathdx to PTX, link it in via LLVM, and get full performance via LTO. However, these often contain NVVM IR, which LLVM cannot handle.
Related discussion: https://discourse.julialang.org/t/using-cublasdx-in-julia/125527