CGBN (CUDA Generic Big Number library by NVlabs) is the standard library for big-number arithmetic on NVIDIA GPUs. However, CGBN has severe performance issues on Blackwell architecture (SM 12.0) — ...