In the old homogeneous world, scaling up meant adding more CPU cores, adding layers of cache, and improving memory to help a multitasking software model. Multicore CPUs grew to the point where ample compute power existed, and I/O became the limiting factor. Intelligent peripherals with DMA and their own embedded cores arrived, offloading some of the processing to handle high-speed I/O without interrupting the CPU cluster.
In the new heterogeneous world, scaling up means packing more IP blocks into an SoC design – with major mobile chipsets now somewhere in the neighborhood of 150 blocks. SoCs are now taking on more application spaces including automotive, augmented and virtual reality, cloud computing, IoT, wearables, and others. Specialized IP cores extend the roster well beyond just CPUs, with GPUs, DSPs, video and audio accelerators, network baseband processing, security and encryption blocks, and more.
Interconnect and memory bandwidth are the limiting factors in most of these heterogeneous SoCs, beyond a moderate level of complexity. CPU-centric cache coherency strategies no longer make sense, with the CPU cores often oblivious to what is going on elsewhere in the SoC. This means the burden of cache coherency is now in the heterogeneous SoC interconnect, with every IP block an agent in system-centric cache coherency services.
Cache rules in heterogeneous designs. Gains from multicore deployment can be completely undone if cores enter a vicious compute, flush, compute, flush, compute, flush cycle. Yet, without system-centric attention to caching and coherency, a typical “divide and conquer” IP block approach optimizes individual blocks hoping the integrated result is satisfactory. Interconnect is often overdesigned to avoid bottlenecking, but at a significant cost in area and power consumption. Coherent versus non-coherent networks are often segregated, again hoping to simplify traffic patterns, but some use cases may require the two sides to talk.
Achieving performance gains of 10x or more at reasonable power consumption calls for a much smarter interconnect designed to handle multiple system use cases. While architects may have expertise in a few scenarios, automation using advanced modeling and machine learning techniques found in NetSpeed Systems’ Gemini coherent interconnect solution augments that expertise while still allowing system-centric configurability.
We introduced the machine-learning aspects of Gemini a few weeks ago (link below). At this week’s Linley Processor Conference, NetSpeed is introducing Gemini 3.0 with several new features including:
- Both AMBA 5 CHI and AMBA 4 ACE physical interconnects are supported. This is a huge differentiator when considering IP block reuse now and in the future.
- With advanced network-on-chip routers borrowing concepts from computer networking, Gemini 3.0 now supports broadcast and multicast modes, providing huge gains in maintaining system-centric coherency.
- A single interconnect supports both cache-coherent and non-coherent agents. Up to 64 cache-coherent blocks are supported, and up to 200 I/O coherent and non-coherent blocks are supported.
- System-level optimizations include an integrated DMA engine with pre-fetch, a configurable register bus layer for fail-safe system debug, enhanced Pegasus last level cache (LLC) configurable for coherent or memory cache, and architected-in functional safety features.
Gemini 3.0 offers a unique path to differentiation through optimizing a system-centric interconnect. Here's the full press release including industry endorsements:
NetSpeed Releases Gemini 3.0 Cache-Coherent NoC IP to Supercharge Heterogeneous SoC Designs
You can find the white paper here:
Achieving heterogeneous cache coherency through NetSpeed Gemini 3.0
For more background on the benefits machine learning provides for NetSpeed customers, here’s my earlier blog post:
SoC QoS gets help from machine learning