Current Work

I am involved in the development of traditional and AI based high performance parallel algorithms for image enhancement and segmentation on NVIDIA GPUs. This includes

1. Persistence and grid-stride loop

2. Exploration and implementation of inter block GPU synchronization (IBS) using a. Lock (i.e. atomic) b. Lock free c. Quasi and d. Cooperative groups

3. Fast parallel implementation of recursion on an NVIDIA GPU using persistence, grid stride loop and IBS by avoiding intermediate transfers between the host “CPU” and the device “GPU”