You are currently viewing SemiWiki as a guest which gives you limited access to the site. To view blog comments and experience other SemiWiki features you must be a registered member. Registration is fast, simple, and absolutely free so please, join our community today!

  • Optimizing SRAM IP for Yield and Reliability

    My IC design career started out with DRAM at Intel, and included SRAM embedded in GPUs, so I recall some common questions that face memory IP designers even today, like:

    • Does reading a bit flip the stored data?
    • Can I write both 0 and 1 into every cell?
    • Will read access times be met?
    • While lowering the supply voltage does the cell data retain?
    • How does my memory perform across variation?

    If you are buying your SRAM IP then maybe you don't have to be so concerned about these questions, however the circuit designers responsible for design and verification of memory IP are very focused on getting the answers to these. Consider the typical, six-transistor SRAM bit cell in a CMOS technology:

    Article: View from the top: Michael Buehler-Garcia-memory-cell-jpg

    The basic memory element has a pair of cross-coupled inverters (devices PD1, PL1 and PD2, PL2) to store a 1 or 0, then to read or write the cell you activate the Word Line (WL) through NMOS devices PG1 and PG2. A circuit designer chooses device sizes and can start to optimize this SRAM bit cell by running a worst-case analysis where one objective is to minimize the read current path shown above in the red arrow. During this worst-case analysis the Vth for devices PG1 and PD1 are adjusted iteratively, your favorite SPICE circuit simulator is run, then the results can be determined by a tool like WiCkeD from MunEDA:

    Article: View from the top: Michael Buehler-Garcia-wca-jpg

    The axis of this plot are the variations in Vth for devices PG1 and PD1, while the circles represent the amount of variation measured in sigma. The read current contour values are shown as dashed lines. To minimize the read current we look at the intersection of the dashed lines and circles, so there's a Red dot showing our worst-case point within a 6 sigma distance.

    Related - High Sigma Yield Analysis and Optimization at DAC

    Moving up one level of hierarchy from a single memory cell to the actual memory array we have a typical architecture that combines cells into columns, where at the bottom is a mux, equalizer and Sense Amp (SA) circuit:

    Article: View from the top: Michael Buehler-Garcia-hierarchy-jpg

    To run a hierarchical analysis and understand how variation effects this SRAM there are challenges:

    1. For each SA there are N cells
    2. In Monte-Carlo sampling we have to take N cell samples for each SA sample
    3. Count the failure rate whenever tSA > tmax


    Brute Force Sampling
    One approach is brute force sampling, so if our SRAM had 256 cells per SA and we wanted 10,000 results, that would require 256 * 10,000 simulation runs in a SPICE circuit simulator. Using a brute force Monte-Carlo analysis for SRAM design isn't really feasible for anything beyond 4 sigma, because it would require millions to billions of SPICE runs, something that we don't have enough time to wait for.

    Article: View from the top: Michael Buehler-Garcia-mc-sample-size-jpg

    Scaled Sampling Approximation
    Another method is for the circuit designer to use only 4 sigma variation in the SA while the memory cell has 6 sigma. This approach takes less effort than brute force and is easier to run, however it provides an incorrect approximation.

    Two Stage Worst-Case Analysis (WCA)
    The recommended approach by MunEDA is to first calculate the 6 sigma worst-case condition of the cell using the Voltage on the Bit Line (VBL) at the moment that the SA is enabled. The second stage is to calculate the 4 sigma worst-case condition of tSA for the sense amp, equalizer and MUX. Here are two charts showing the SA offset versus cell current for a variation where the worst-case point is in spec (green region), and out of spec (red region).

    Article: View from the top: Michael Buehler-Garcia-two-stage-wca-jpg
    Worst-case point is in spec, then out of spec


    You can also compare a sampling approach against the two stage WCA by looking at the following charts for SA offset versus cell current:

    Article: View from the top: Michael Buehler-Garcia-sampling-jpg
    With the sampling approach it estimates the failure rate by using sampling points, Red dots for failing and Green dots for passing. On the downside sampling relies on tail distribution accuracy, and suffers from sampling error. The distribution of local variation variables in the tails with >5 sigma is not well characterized, so the true tail distribution in silicon can differ significantly from the ideal Gaussian distribution used inside the model files. Running a global process monte-carlo is not a guarantee to cover the full corner range that can be seen in silicon.

    So a large local plus global monte-carlo run is infeasible because of long run times, plus it's sensitive to distribution errors. Even speeding up monte-carlo is not really sufficient because it will just produce unsafe results in a shorter period of time. So, what we really need is a new method that can:

    1. Handle the large, structured, hierarchical netlist of SRAM arrays.
    2. Adjust conservatism in the local variation tails
    3. Run quickly, so that local variation analysis can be repeated over multiple PVT corners, design options and layout options


    Article: View from the top: Michael Buehler-Garcia-muneda-2-stage-wca-jpg
    With two stage WCA we are estimating the failure rate by a large sample Monte-Carlo approximation in the pink region, using a conservative estimate in the pink plus green region, and showing the worst-case point check as passing by a Green dot. The tool flow GUI in WiCkeD makes it quite easy for a circuit designer to specify their own memory array size, target failure rate, and to trade off the array failure rate with read time:

    Article: View from the top: Michael Buehler-Garcia-two-stage-wcd-flow-gui-jpg

    Comparing all three analysis techniques for a 28nm SRAM block show how the two stage WCA approach uses the least CPU effort in SPICE circuit simulations, scales well to high sigma analysis, and has results close to full Monte-Carlo:


    Article: View from the top: Michael Buehler-Garcia-comparisons-jpg

    Related - When it comes to High-Sigma verification, go for insight, accuracy and performance

    Summary

    It is possible to design, analyze and optimize SRAM IP blocks using a two stage WCA approach, while taking much less circuit simulation time than a brute force Monte-Carlo, and at comparable accuracy. All you need to add into your existing EDA tool flow is the WCA capabilities in the MunEDA WiCkeD tool.

    To find out more about MunEDA, there's a 30 minute webinar coming up on September 9th at 10AM (PST), register here.