You are currently viewing SemiWiki as a guest which gives you limited access to the site. To view blog comments and experience other SemiWiki features you must be a registered member. Registration is fast, simple, and absolutely free so please, join our community today!

  • Processing Power Driving Practicality of Machine Learning

    Despite their recent rise to prominence, the fundamentals of AI, specifically neural networks and deep learning, were established as far back as the late 50s and early 60s. The first neural network, the Perceptron, had a single layer and was good certain types of recognition. However, the Perceptron was unable to learn how to handle XOR operations. What eventually followed were multi-layer neural networks that performed much better at recognition tasks, but required more effort to train. Until the early 2000s the field was held back by limitations that can be tied back to insufficient computing resources and training data.

    Article: Phablet, e-reader, Nexus-nn-complexity-vs-animals-copy.jpg

    All this changed as chip speeds increased and the internet provided a rich set of images for use in training. ImageNet was one of the first really significant sources of labeled images, the type needed to perform higher quality training. Nevertheless, the theoretical underpinnings were established decades ago. Multilayer networks proved much more effective at recognition tasks, and with them came additional processing requirements. So today we have so called deep learning which boasts many layers of processing.

    While neural networks provide a general-purpose method of solving problems that does not require formal coding, there are still many architectural choices that are needed to provide an optimal network for a given class of problems. Neural networks have relied on general purpose CPUs, GPUs or custom ASICs. CPUs have the advantage of flexibility, but this comes at the cost of lower throughput. Loading and storing of operands and results creates significant overhead. Likewise, GPUs are often optimized to use local memory and perform floating point operations, which together do not always best serve deep learning requirements.

    The ideal neural network is a systolic network where data is moved directly from processing element to processing element. Also, deep learning has become very efficient with low precision integer operations. So, it seems that perhaps ASICs might be the better vehicle. However, as architectures of neural networks themselves evolve, ASIC might prematurely lock in an architecture and prevent optimization based on real world experience.

    It turns out that FPGAs are a nice fit for this problem. In a recent white paper by Achronix, they point out the advantages that FPGAs bring to deep learning. The white paper, entitled The Ideal Solution for AI Applications Speedcore eFPGAs, goes further to suggest that embedded FPGA is even more aptly suited to this class of problems. The paper starts out with an easily readable introduction to the history and underpinnings of deep learning, then moves on the specifics of how processing power has created the revolution we are now witnessing.

    Yet, conventional FPGA devices introduce their own problems. In many cases they are not optimally configured for specific applications. Designers must accept the resource allocation available in commercially available parts. There is also the perennial problem of off chip communication. Conventional FPGAs require moving the data through IOs onto board traces and then back onto the other chip. The round trip can be prohibitively expensive from a power and performance perspective.

    Article: Phablet, e-reader, Nexus-achronix-avoiding-ios-min.jpg

    Achronix now offers embeddable FPGA fabric, which they call eFPGA. Because it is completely configurable, only the necessary LUTs, memories, DSP, interfaces, etc. need to be included. And, of course, the communication with other elements of the system are through direct bus interconnection or an on-chip NoC. This reduces silicon that is needed for IOs on both ends.

    Article: Phablet, e-reader, Nexus-achronix-yolo-min.jpg

    The techniques and architectures used for neural networks are rapidly evolving. Design approaches that provide maximum flexibility require experimentation and evolution. Having the ability to modify the architecture can be crucial. Embedded FPGAs definitely have a role to play in this rapidly growing and evolving segment. The Achronix white paper is available on their web site for engineers who want to look deeper into this approach.

    Read more about Achronix on