Design of hardware accelerators for neural network (NN) applications involves walking a tight rope amidst the constraints of low-power, high accuracy and throughput. NVIDIA's Jetson is a promising platform for embedded machine learning which seeks to achieve a balance between the above objectives.

Our survey reviews 110+ works that evaluate and optimize neural network applications on Jetson platform. We review both hardware and algorithmic optimizations performed for running NN algorithms on Jetson and show the real-life applications where these algorithms have been applied. We also review the works that compare Jetson with similar platforms such as FPGA, Raspberry Pi.

Many of the ideas and optimizations discussed in the survey will also apply to existing and future embedded systems. The survey seeks to show how the choice of machine learning algorithm is made not only based on its accuracy but also based on the resource-constraints of the hardware platform. In fact, hardware platform can force fundamental changes in the algorithm design.

PS: Survey includes works that use Jetson TK1, TX1 and TX2, but not Xavier, since Xavier has been released very recently.