Array

Instance

Array
(
    [title] => Recent Forum Threads
    [title_url] => 
    [ignore_sticky] => 0
    [exclude_current] => 0
    [limit] => 10
    [sluglist] => ["jobs-dashboard"]
    [rw_opt] => Array
        (
            [widget_select] => 1
            [pageid_281769] => 1
            [pageid_281772] => 1
        )

    [display_widget_mobile] => 
    [rw_opt_exclude] => Array
        (
            [pageid_274493] => 1
            [cpt_podcast] => 1
            [cpta_podcast] => 1
            [category_16613] => 1
            [category_16631] => 1
            [taxonomy_series] => 1
        )

    [node_id] => Array
        (
            [0] => 2
        )

)

Threads

Recent Article Comments

Real men have fabs!
Under EDA you should add Siemens, Keysight EDA, etc.

— Daniel Payne on April 24, 2024
Real men have fabs!
Curious about the last chart - why is Korea's CapEx:Revenue ratio for foundry so much lower than all the others…

— Peter Bennet on April 24, 2024
Huawei’s and SMIC’s Requirement for 5nm Production: Improving Multipatterning Productivity
Yes, with damascene the cuts are blocks. The arrangement of the two etch materials like oxide and nitride comes about…

— Fred Chen on April 23, 2024
Huawei’s and SMIC’s Requirement for 5nm Production: Improving Multipatterning Productivity
Interesting round up Fred. Given how the Damascene process works, my understanding is that you can't do cuts you have…

— nghanayem on April 23, 2024
ASML- Soft revenues & Orders – But…China 49% – Memory Improving
What does "the street" base their projections on? Would "the street" like to see ASML jacking up their prices as…

— Barnsley on April 22, 2024
ASML- Soft revenues & Orders – But…China 49% – Memory Improving
DUV will not go away, KrF sales show that. Even the I-line is still present. High-NA EUV has not been…

— Fred Chen on April 20, 2024
The Data Crisis is Unfolding – Are We Ready?
Sounds like this trend will be a major driver for HBM memory to cut down on the traffic load. Also,…

— Arthur Hanson on April 12, 2024
MZ Technologies Enables Multi-Die Design with GENIO
it looks Siemens may have intereset to acquries this company

— yanfeng on April 9, 2024
Strong End to 2023 Drives Healthy 2024
I think downward revisions are in the very near future!

— Daniel Nenni on April 9, 2024
Pinning Down an EUV Resist’s Resolution vs. Throughput
Here the red line drawn is for 3s/avg=10%, more as a reference. What was meant to be highlighted was the…

— Fred Chen on April 8, 2024

hip webinar automating integration workflow 800x100 (1)

WP_Term Object
(
    [term_id] => 157
    [name] => EDA
    [slug] => eda
    [term_group] => 0
    [term_taxonomy_id] => 157
    [taxonomy] => category
    [description] => Electronic Design Automation
    [parent] => 0
    [count] => 3903
    [filter] => raw
    [cat_ID] => 157
    [category_count] => 3903
    [category_description] => Electronic Design Automation
    [cat_name] => EDA
    [category_nicename] => eda
    [category_parent] => 0
)

December 30, 2014 by Don Dingee

SoCs should invest in a strong cache position

SoCs should invest in a strong cache position
by Don Dingee on 12-30-2014 at 4:00 pm
Categories: EDA

Like most technology firms, Apple has been home to many successes, and some spectacular defeats. One failure was Project Aquarius. At the dawn of the RISC era, before ARM architecture was “discovered” in Cupertino, engineers were hunkered over a Cray X-MP/48. The objective was to design Apple’s own quad core RISC processor to speed up the Macintosh.

As if designing an instruction set, execution units, and pipeline is not hard enough, getting four cores to work together is more than simply a matter of cloning and connecting. Aquarius never got close to silicon. I’m guessing Apple ran head on into the pitfalls of bus arbitration and cache coherency in multiprocessor scenarios. After three years of effort, Aquarius was scuttled, with Apple soon thereafter turning to IBM and Motorola for help in designing PowerPC.

The dream of an Apple homegrown quad core processor didn’t die, but it did have to wait for technology to catch up. Fortunately for Apple and all SoC designers, ARM and others have since made tremendous progress on processor cores, bus interconnect, and cache coherency.

However, entering 2015 we are far from having all the issues around cache conveniently solved.

Why is cache coherency so hard to get right? I asked Bill Neifert of Carbon Design Systems that question, and he pointed me to an article he co-wrote recently with Adnan Hamid of Breker Verification Systems over in EETimes.

Fast, Thorough Verification of Multiprocessor SoC Cache Coherency

The good news is on the CPU side. ARMv8 IP has migrated toward a cluster strategy, building a quad-core complex with cache built in. Each processor core has its own L1 cache, and the cluster shares an L2 cache. Carbon has fast models for these clusters, and everything is great for verification using virtual prototypes.

Until designers stick that ARM cluster in an actual SoC design. Three things happen:

1) To differentiate SoC designs, folks are modifying the ARM CPU clusters.
2) To make SoC designs do actual work, folks are adding other types of IP cores.
3) To help performance, folks are adding L3 cache in the interconnect fabric.

As Neifert puts it, if you change it, by definition you break it. The first of these is manageable; changes in the CPU cluster are known, the model is updated, verification is run. Perhaps not simple, but straightforward, and the virtual prototyping tools for solving this are solid.

The second issue is also manageable, even for homegrown IP. Let’s assume each IP block in the design has an accurate model and flies through verification – again, non-trivial but achievable.

Now comes the third step. We put those blocks into a modeled interconnect, and … hey, why didn’t that work? The block-level functional verification effort was fine, but the nuances of system-level interaction and timing kick in. What was a perfectly accurate model at the block-level may be inadequate at system-level. If the interconnect has L3 cache – and ARM CoreLink certainly does – system-level caching can quickly turn into an unbounded issue if there are any tweaks in IP.

Cache is a funny thing. IBM has a lot of experience with multicore caching, and they use the term “cache pressure.” If there were only one thread of execution using cache, things might behave as expected. As more tasks are added, at some point cache contention slows all the threads using the cache, not just the one experiencing a cache miss.

Expand that thought across a bunch of heterogeneous cores – CPU, GPU, DSP, PCIe, SATA, USB, Ethernet – each with their own L1/L2 and all using a sea of L3 cache in the interconnect. Cache implementations vary wildly; there are different line widths, update policies, and address maps to deal with. “Some problems only happen when mixing blocks with accurate models,” says Neifert, which qualifies for the understatement of the year.

Exposing these problems by hand is excruciating. Breker and Carbon have teamed up for one solution, using automatically generated test cases against a virtual prototype with 100% accurate models. This allows a robust set of test cases to execute cache stress tests against known-good IP block models in a system-level configuration. Leveraging the fast models in the Carbon CPAK also many tests to run in a reasonable amount of time.

If all cache were created equal, we wouldn’t be having this discussion. Cache coherency in SoCs with many heterogeneous cores and a fabric interconnect is the frontier for SoC design. Neifert suggests there is a lot of “special sauce” being used right now, and teams are reluctant to share solutions – partly because the solution depends on their exact configuration. The ARM out-of-the-box IP is a good starting point, but given modifications and incorporation of other IP from third party and homegrown development, help is needed.

Don’t be like the John Sculley Apple. The autogenerated verification test bench described in the EETimes article is worth looking at, exploring the issues in system-level multicore SoC cache coherency and an approach to uncovering them using fast models and virtual prototypes.

Share this post via:

Comments

0 Replies to “SoCs should invest in a strong cache position”

You must register or log in to view/post comments.

Real men have fabs!
Under EDA you should add Siemens, Keysight EDA, etc.

— Daniel Payne on April 24, 2024
Real men have fabs!
Curious about the last chart - why is Korea's CapEx:Revenue ratio for foundry so much lower than all the others…

— Peter Bennet on April 24, 2024
Huawei’s and SMIC’s Requirement for 5nm Production: Improving Multipatterning Productivity
Yes, with damascene the cuts are blocks. The arrangement of the two etch materials like oxide and nitride comes about…

— Fred Chen on April 23, 2024
Huawei’s and SMIC’s Requirement for 5nm Production: Improving Multipatterning Productivity
Interesting round up Fred. Given how the Damascene process works, my understanding is that you can't do cuts you have…

— nghanayem on April 23, 2024
ASML- Soft revenues & Orders – But…China 49% – Memory Improving
What does "the street" base their projections on? Would "the street" like to see ASML jacking up their prices as…

— Barnsley on April 22, 2024
ASML- Soft revenues & Orders – But…China 49% – Memory Improving
DUV will not go away, KrF sales show that. Even the I-line is still present. High-NA EUV has not been…

— Fred Chen on April 20, 2024
The Data Crisis is Unfolding – Are We Ready?
Sounds like this trend will be a major driver for HBM memory to cut down on the traffic load. Also,…

— Arthur Hanson on April 12, 2024
MZ Technologies Enables Multi-Die Design with GENIO
it looks Siemens may have intereset to acquries this company

— yanfeng on April 9, 2024

Search Semiwiki

Recent Forum Threads

Recent Article Comments

Recent Podcast Episodes

Comments

0 Replies to “SoCs should invest in a strong cache position”

Recent Forum Threads

Recent Article Comments