You are currently viewing SemiWiki as a guest which gives you limited access to the site. To view blog comments and experience other SemiWiki features you must be a registered member. Registration is fast, simple, and absolutely free so please, join our community today!

  • Notes from the Neural Edge

    Cadence recently hosted a summit on embedded neural nets, the second in a series for them. This isn't a Cadence pitch but it is noteworthy that Cadence is leading a discussion on a topic which is arguably the hottest in tech today, with this range and expertise of speakers (Stanford, Berkeley, ex-Baidu, Deepscale, Cadence and more), and drawing at times a standing room only crowd. Itís encouraging to see them take a place at the big table; Iím looking forward to seeing more of this.

    Article: Apple and Samsung Take All the Profit-image-recognition-min.jpg

    This was an information-rich event so I can only offer a quick summary of highlights. If you want to dig deeper Cadence has said that they will post the slides within the next few weeks. The theme was around embedding neural nets in the edge Ė smartphones and IoT devices. I talked about this in an earlier blog. We can already do lots of clever recognition in the cloud and we do training in the cloud. But, as one speaker observed, inference needs to be on the edge to be widely useful; value is greatly diminished if you must go back to the cloud for each recognition task. So the big focus now is on embedded applications, particularly in vision, speech and natural language (Iíll mostly use vision applications as examples in the rest of the blog). Embedded application creates new challenges because it needs to be much lower power, it needs to run fast on limited resources and it must be much more accessible to a wide range of developers.

    One common theme was need for greatly improved algorithms. To see why, understand that recent deep nets can have ~250 layers. In theory each node in each layer requires a multiply-accumulate (MAC) and the number of these required per layer may not be dramatically less than the number of pixels in an image. Which means youíll need to process at billions of MACs per second in a naÔve implementation. But great progress is being made. Several speakers talked about sparse matrix handling; many/most (trained) weights for real recognition are zero so all those operations can be skipped. And training downloads/update sizes can be massively reduced.

    Then thereís operation accuracy. We tend to think that more is always better (floating point, 64 bit), especially in handling images, but apparently that has been massive overkill. Multiple speakers talked about weights as fixed-point numbers and most were getting down to 4-bit sizes. You might think this creates massive noise in recognition but it seems that incremental accuracy achieved above this level is negligible. This is supported empirically and to some extent theoretically. One speaker even successfully used ternary weights (-1, 0 and +1). These improvements further reduce power and increase performance.

    Another observation was that general-purpose algorithms are often the wrong way to go. General-purpose may be easier in implementation, but some objectives can be much better optimized if tuned to an objective. A good example is image segmentation Ė localizing a lane on the road, or a pedestrian, or a nearby car. For (automotive) ADAS applications the goal is to find bounding boxes, not detailed information about an object, which can make recognition much more efficient. Incidentally, you might think optimizing power shouldnít be a big deal in a car, but I learned at this summit that one current autonomous system fills the trunk of a BMW with electronics and must cool down after 2 hours of driving. So I guess it is a big deal.

    Article: Apple and Samsung Take All the Profit-dnn-platform-efficiency-min.jpg

    What is the best platform for neural nets as measured by performance and power efficiency? Itís generally agreed that CPUs arenít in the running, GPUs and FPGAs do better but are not as effective as DSPs designed for vision applications, DSPs tuned to vision and neural net applications do better still. And as always, engines custom-designed for NN applications outperform everything else. Some of these can get interesting. Kunle Olokotun, a professor at Stanford presented a tiled interleaving of memory processing units and pattern processing units as one approach, but of course custom hardware will need to show compelling advantages outside research programs. Closer to volume applications, Cadence showed several special capabilities they have added to their Vision P6 DSP, designed around minimizing power per MAC, minimizing data movement and optimizing MACs per second.

    Another problem that got quite a bit of coverage was software development productivity and building a base of engineers skilled in this field. Google, Facebook and similar companies can afford armies of PhDs, but thatís not a workable solution for most solution providers. A lot of work is going into democratizing recognition intelligence through platforms and libraries like OpenCV, Vuforia and OpenVX. Stanford is working on OptiML to intelligently map from parallel patterns in a re-targetable way onto different underlying hardware platforms. As for building a pool of skilled graduates, that one seems to be solving itself. In the US at least, Machine Learning is apparently the fastest-growing unit in undergraduate CS programs.

    Article: Apple and Samsung Take All the Profit-pixel-explosion-min.jpg
    Pixel explosion in image sensors

    AI was stuck for a long time in cycles of disappointment where results never quite rose to expectations, but neural nets have decisively broken out of that trap, generally meeting or exceeding human performance. Among many examples, automated recognition is now detecting skin cancers with the same level of accuracy as dermatologists with 12 years training and lip-reading solutions (useful when giving commands in a noisy environment) are detecting sentences at better than 90% accuracy, compared to human lip-readers at ~55%. Perhaps most important, recognition is now going mainstream. Advanced ADAS features such as lane control and collision-avoidance already depend on scene segmentation. Meanwhile the number of image sensors already built surpasses the number of people in the world and is growing exponentially, implying that automated recognition of varying types must be growing at similar speeds. Neural net-based recognition seems to have entered a new and virtuous cycle, driving rapid advances of the kind listed here and rapid adoption in the market. Heady times for people in this field.

    You can learn more about Cadence vision solutions HERE.

    More articles by Bernard...