GPU acceleration for Surgical Eye Imaging

CUDA Implementation: results

nCenter of mass calculation CPU (3Mhz Xeon) versus CUDA code (executed on GTX260) on different stages of optimization

SIAM MI09

2.22

2.42

3.25

5.02

49.1

90.6

Time ms

CPU

A – initial port to CUDA

B – factoring out cycle invariant video memory allocation/de-allocation

C – three kernels (x and y of numerator and denominator) merged into single kernel

D – general optimization of conditional logic in the kernel

E – moving out cycle invariant computations into separate kernel