Yesterday morning while eating my “Free Wednesday Breakfast” chocolate croissant and fresh fruit with yoghurt, I watched an interview with John Nolan entitled “The State of Hardware Acceleration with GPUs/FPGAs, Parallel Algorithm Design.” In the spirit of giving back, I’m posting a few notes.
- When optimizing code for GPU, FPGA, or CPU, definitely focus on pipelining and overall throughput, not just local optimizations.
- There’s a trade-off between “faster” and “sooner.” It’s not always worth saving a few seconds (or even a few minutes) if the kernels take hours or days to compile. (Then again, sometimes it is.)
- Try to reduce dependence on the language/compiler “stack” that removes inefficiencies. The optimizer does good work, but you can do things to help it. Think about the hardware or architecture format. It’s not a sin to reduce the amount of abstraction in the service of performance. Pay attention to things that affect processor pipelining and cache movement.
- BTW, some languages and technologies exist to provide higher level programming that’s close to the hardware, but they’re proprietary, secret, or still in R&D.
- Use algorithmic optimization techniques. Step back and find the shortest-time computation.
- Avoid using if statements. The goto construct is considered harmful, but if is basically the same thing. Instead think about state machines and polymorphism. There’s no branch-prediction penalty to pay, since the system “just is” in the state it’s supposed to be in. The logic is clearer, because there are no switches, making it easier to test, too.
- Don’t always assume that floating-point values are necessary. Integers can often be creatively used and are far faster for math than double-precision numbers.
- Of course, there’s a compromise between speedy/efficient and readable/maintainable.
- Aim to structure programs as “symbolic intent.” Mathematical descriptions are bad ways of expressing programs. Think about functional programming models instead of procedural.
If you want to know more, you should definitely watch the half-hour interview. And if your reaction was more along the lines of “Yes, yes; that’s all true, and it’s how I design my image processing code,” then I definitely hope you’ll consider applying for the GPU/multicore engineering position we have open.