You are currently viewing SemiWiki as a guest which gives you limited access to the site. To view blog comments and experience other SemiWiki features you must be a registered member. Registration is fast, simple, and absolutely free so please, join our community today!

  • Stress and Aging

    These failings aren’t just a cross we humans bear; they’re also a concern for chips, particularly in electrical over-stress (EOS) and aging of the circuitry. Such concerns are not new, but they are taking on new urgency given the high reliability and long lifetime expectations we have for safety-critical components in cars and maintenance-free needs in many industrial applications.

    Article: Tech Forum, February 5, features 32/28-, 20-, 14-, and 10-nanometer processes-stress-min.jpeg

    EOS is generally grouped with two other failure-modes: electromigration (EM) and electrostatic discharge (ESD). EM is primarily about current; inrush when switching on a power domain or localized high activity can knock metal atoms in interconnect aside; over time this accelerates through increased resistive heating. ESD and EOS are two ends of a spectrum related to high electric fields (modest voltage differences over tiny distances), causing dielectric breakdown – in gate-oxide for example. ESD is normally associated with external static shocks propagating into a device – big spikes but short duration.

    EOS is generally associated with misbehavior in the “normally” operating circuit, often a result of incorrect handling through voltage- or power-domain crossings where isolation or level-shifting was poorly implemented. These events can obviously persist for much longer than for ESD. João Geada, Chief Technologist at ANSYS, told me getting to a bad state doesn’t take much; for ultra-low-power HVT transistors, 0.1V (or less) outside the operating range starts to cause damage. The acuteness of all three failure modes is amplified in advanced technologies where metal widths and dielectric thicknesses have become so tiny.

    You really want to check for problems like this at the implementation/transistor level and obviously you have to consider that EOS problems will be activity-dependent. One approach is to analyze statically (a bit like STA), but João says that tends to be very noisy (unsurprising). Dynamic analysis will obviously avoid the noise problem; that’s where ANSYS’ FX (super-fast Spice-accurate simulation) comes in. They have introduced an application, EOS FX, specifically to deal with this need. EOS FX is a full block/IP solution which will automate input vectors to cover all scenarios for possible EOS failures.

    Aging is a different problem. We’ve all experienced this – you buy an electronic device, it works perfectly for a couple of years, then it jams for no obvious reason. You power-cycle it and it works fine again; but over time you find it jams more frequently and you have to power-cycle more and more often. A major root-cause is something call negative-bias temperature instability (NBTI) which occurs when (stable) electric fields are applied for a long time across a dielectric. (Another contributor is hot carrier injection which I will skip here.)

    NBTI causes voltage thresholds to increase over time, which in turn causes propagation delays to increase over time. The effect is limited if those fields cycle frequently, as for a free-running clock for example, but has bigger impact in circuitry which rests in a stable state for extended periods. Curiously, clock gating exacerbates this problem. When the clock is gated off, state and therefore electric fields throughout downstream logic are fixed, giving free rein to NBTI to age that logic. This problem could be fixed by power gating the logic (turning off the power makes electric field go to zero) but that has downsides too (power-on latency, inrush, …). Oh well.

    The impact of this aging is both circuit- and activity-dependent and will vary across the device; different parts of a design will age at different rates. Since aging increases delays, you should expect that paths which passed as non-critical for the new device will over time drift closer to critical or will become critical (hello device hangs). How likely this is to happen depends on how much margin you built into path slacks. You could also use adaptive voltage scaling (AVS) to compensate for delay degradation. Either way, you can’t afford to build enough slack into every path or put AVS islands everywhere, so you have to be selective, which again requires detailed transistor/implementation analysis with activity.

    ANSYS now offers FX Aging for this analysis. I believe this is the only detailed dynamic aging analysis on the market. The application builds a database of signal transition probabilities (from use-cases I believe) and propagates these through the circuit, looking for long term aging-stress candidates. It also looks at potentially infrequent events (possibly not covered in the use-cases), such as clock gating turn-on and isolation, which are particularly at-risk candidates. Stress conditions are propagated down to the transistor-level to ensure an accurate view of differential aging across the design.

    Article: Tech Forum, February 5, features 32/28-, 20-, 14-, and 10-nanometer processes-hisilicon-example-min.jpg

    How big a deal is this? Various presentations (eg this) show voltage thresholds increasing from 130mV to 210mV over a period of 5 years. When the operating voltage is 1V (or lower), that can have a huge impact on propagation delays. What makes both EOS FX and FX Aging possible is FX’s ability (on which I have written before) to do Spice-accurate path delay analyses orders of magnitude faster than Spice. That speed and accuracy allows for confident EOS and aging-based signoff in practical run-times. You can learn more about these ANSYS products HERE.