Array
(
    [content] => 
    [params] => Array
        (
            [0] => /forum/threads/intel%E2%80%99s-18a-rumors-meet-a-thermal-brick-wall-says-semiwiki.24458/page-2
        )

    [addOns] => Array
        (
            [DL6/MLTP] => 13
            [Hampel/TimeZoneDebug] => 1000070
            [SV/ChangePostDate] => 2010200
            [SemiWiki/Newsletter] => 1000010
            [SemiWiki/WPMenu] => 1000010
            [SemiWiki/XPressExtend] => 1000010
            [ThemeHouse/XLink] => 1000970
            [ThemeHouse/XPress] => 1010570
            [XF] => 2030970
            [XFI] => 1060170
        )

    [wordpress] => /var/www/html
)

Intel’s 18A rumors meet a thermal brick wall says SemiWiki

BSPD is an HPC technology for all the reasons Ian pointed out above and also because it costs more and requires new design techniques. The mobile guys do not want BSPD, they don't want the cost, it doesn't align with their needs and they don't know how to design for it.. TSMC has 2nm without BSPD, then A16 with BSPD (that I hear will be pretty much an NVIDIA node), then A14 without BSPD and then an A14 follow on with BSPD. The mobile guys will use 2nm and A14 and the HPC guys will use A16 and the A14 follow on process. I have heard Intel may offer a 14A version without BSPD, if so that would make them a mobile option once it comes out.
When you first mentioned this a while ago, it was very eye opening and explained a lot of what i was hearing on 18A. Thanks for the inputs
 
BSPD is an HPC technology for all the reasons Ian pointed out above and also because it costs more and requires new design techniques. The mobile guys do not want BSPD, they don't want the cost, it doesn't align with their needs and they don't know how to design for it.. TSMC has 2nm without BSPD, then A16 with BSPD (that I hear will be pretty much an NVIDIA node), then A14 without BSPD and then an A14 follow on with BSPD. The mobile guys will use 2nm and A14 and the HPC guys will use A16 and the A14 follow on process. I have heard Intel may offer a 14A version without BSPD, if so that would make them a mobile option once it comes out.
Agreed 100%, it's exactly what I've been saying. The problem for Intel is that quite apart from the fact that they've publicly nailed their colors to the BSPD mast, if they did a FSPD variant of 14A it would be late to market compared to TSMC, have much poorer IP support, and be more expensive due to generally higher wafer costs and lower yield at Intel -- especially since TSMC will be *at least* a year further down the yield curve at any point in time.

Given that TTM and KGD cost and IP support are probably the three most critical things for mobile and Intel would be at a disadvantage in all of these, it's difficult to see how they could be successful. Which given the existing uncertainly about how successful Intel will be in the foundry market even with their BSPD advantage (and whether they'll carry on investing enough to make this happen big-time), you'd think that would make it *very* difficult for Intel management to make the business case for doing 14A FSPD -- plus having to find the extra resources to develop/qualify a quite different process.

If Intel were going to do FSPD they should have already done the process development/qualification in parallel with BSPD 14A, but they didn't so they've missed the boat -- at least, for this node.
 
Not very scientific, but I found a couple of reviews using a common laptop chassis to try to compare 18A and N3B thermal performance (Panther Lake and Arrow Lake-H) respectively. Both tests used Cinebench in a loop to determine what power level the CPU could stay at, for a given fan profile. Note the Cinebench performance numbers can't be directly compared as they used different versions, but the drop in "best run" vs "10 minutes of heat soaked" may be useful.

Power at Temp (measured by the CPU software) with fans set to performance:
- Arrow Lake-H can sustain about 35W ("high -80's C")
- Panther Lake can sustain 30W (77C) or 44W (92C) with fans set to performance with keyboard attached

Power/Temp in silent/whisper mode:
- Arrow-Lake H "down to about 20W at low 70's C"
- Panther Lake 20W at 67C

Performance - best run vs "10 minutes heat soaked" - Cinebench multithreaded, fans in performance mode.
- Arrow Lake-H went from 17348 to 15627 for CB 2023, a drop of about 10%
- Panther Lake went from 1142 to 1103 for CB 2024, a drop of 4%.

Source 1: Panther Lake Asus Zenbook Duo: https://www.techpowerup.com/review/asus-zenbook-duo-ux8407/10.html
Source 2: Arrow Lake-H Asus Zenbook Duo: https://www.ultrabookreview.com/70717-asus-zenbook-duo-review-2025/

Take-aways: Using a similar (same?) laptop chassis, Panther Lake is losing less performance when going from 'first run' to 'running for 10 minutes' in a heat soak benchmark. Power @ Temp also appears to be about equal for both chips, indicating the thermal resistance of an 18A chip might not be significantly worse than a N3B variant.

Full caveats that these are different architectures, and Intel's Panther Lake is a HUGE improvement in efficiency vs. prior Intel and current AMD offerings.

P.S. I think Arrow Lake-H is a bit of a better foil as Lunar Lake just has too few cores for an 'even' comparison in thermals and performance. Also, without seeing teardowns - ASUS could have changed it's cooling solution between the Arrow Lake-H and Panther Lake laptops, so this is definitely not a 'great' comparison.
 
Not very scientific, but I found a couple of reviews using a common laptop chassis to try to compare 18A and N3B thermal performance (Panther Lake and Arrow Lake-H) respectively. Both tests used Cinebench in a loop to determine what power level the CPU could stay at, for a given fan profile. Note the Cinebench performance numbers can't be directly compared as they used different versions, but the drop in "best run" vs "10 minutes of heat soaked" may be useful.

Power at Temp (measured by the CPU software) with fans set to performance:
- Arrow Lake-H can sustain about 35W ("high -80's C")
- Panther Lake can sustain 30W (77C) or 44W (92C) with fans set to performance with keyboard attached

Power/Temp in silent/whisper mode:
- Arrow-Lake H "down to about 20W at low 70's C"
- Panther Lake 20W at 67C

Performance - best run vs "10 minutes heat soaked" - Cinebench multithreaded, fans in performance mode.
- Arrow Lake-H went from 17348 to 15627 for CB 2023, a drop of about 10%
- Panther Lake went from 1142 to 1103 for CB 2024, a drop of 4%.

Source 1: Panther Lake Asus Zenbook Duo: https://www.techpowerup.com/review/asus-zenbook-duo-ux8407/10.html
Source 2: Arrow Lake-H Asus Zenbook Duo: https://www.ultrabookreview.com/70717-asus-zenbook-duo-review-2025/

Take-aways: Using a similar (same?) laptop chassis, Panther Lake is losing less performance when going from 'first run' to 'running for 10 minutes' in a heat soak benchmark. Power @ Temp also appears to be about equal for both chips, indicating the thermal resistance of an 18A chip might not be significantly worse than a N3B variant.

Full caveats that these are different architectures, and Intel's Panther Lake is a HUGE improvement in efficiency vs. prior Intel and current AMD offerings.

P.S. I think Arrow Lake-H is a bit of a better foil as Lunar Lake just has too few cores for an 'even' comparison in thermals and performance. Also, without seeing teardowns - ASUS could have changed it's cooling solution between the Arrow Lake-H and Panther Lake laptops, so this is definitely not a 'great' comparison.
The problem here is there are too many variables (different chips for different generations) -- and this includes the hidden one of where the temperature sensor on-chip is placed. These measure the sensor which is invariably embedded in the substrate, and not the hottest bits of the circuit which are the high-power-density transistors and the metal attached to them -- I know, we use sensors like this and have compared what they read compared to the critical temperatures (using thermal simulations), and they don't track well -- the difference is bigger in N2 (nanosheet) then N3 (FinFET), and a lot bigger still with BSPD.

The overall thermal resistance from chip to heatsink/outside world is not the biggest problem with BSPD, temperature differences across the die are.

Here are some example hotspot calculations for different processes, at a power density of 100W/mm2 -- which sounds ludicrously high because it would be for the whole die, but it's equivalent to 10mW power dissipation in a 10um x 10um circuit (or 0.1mW in 1um x 1um) which is not unusual for small high-speed circuits like clock drivers (actually we've seen even higher numbers inside very high-speed circuits like SERDES). This particular case shows that for "typical" circuit sizes at this power density, BSPD runs about 20C hotter than FSPD.

Note that this based on a particular set of assumptions and is *not* a general case which applies to all chips, but it's also not an uncommon one in high-performance devices. Also note that this does not include the very local self-heating down at the gate stripe level (<0.1um) and in elevated fin/nanosheet devices (can also be up to 20C or so) which is on top of this...

hotspots.png
 
Last edited:
The problem here is there are too many variables (different chips for different generations) -- and this includes the hidden one of where the temperature sensor on-chip is placed. These measure the sensor which is invariably embedded in the substrate, and not the hottest bits of the circuit which are the high-power-density transistors and the metal attached to them -- I know, we use sensors like this and have compared what they read compared to the critical temperatures (using thermal simulations), and they don't track well -- the difference is bigger in N2 (nanosheet) then N3 (FinFET), and a lot bigger still with BSPD.

A few questions for my education -

How accurate are typical on-die temperature sensors? (just curious how much this affects variability, too)

Do modern CPUs / SoCs typically throttle portions of the chip (i.e. temperature sensors all over driving thermal decisions), or do they typically still lean towards looking at the hottest spot(s) and slowing down the entire chip accordingly?

..

The piece I found most interesting in comparison was that the CPUs still appeared to be capable of sustaining roughly the same overall wattage level in roughly the same form factor (i.e. laptops should be very similar - cooling and chassis). I definitely appreciate local hotspots can differ signficantly, and I also did not look up or include die sizes for reference which is yet another variable that makes the comparison less accurate.

The density of power for local hotspots in your writeup is very interesting (denser in places than I would have expected) - thanks for that! I'm curious if GAAFET helps transmit heat between transistor/areas better than FINFET and how that compares to Planar. Is that sort of implied in the data there, or too many variables because of top / bottom cooling and insulation?
 
A few questions for my education -

How accurate are typical on-die temperature sensors? (just curious how much this affects variability, too)

Do modern CPUs / SoCs typically throttle portions of the chip (i.e. temperature sensors all over driving thermal decisions), or do they typically still lean towards looking at the hottest spot(s) and slowing down the entire chip accordingly?

..

The piece I found most interesting in comparison was that the CPUs still appeared to be capable of sustaining roughly the same overall wattage level in roughly the same form factor (i.e. laptops should be very similar - cooling and chassis). I definitely appreciate local hotspots can differ signficantly, and I also did not look up or include die sizes for reference which is yet another variable that makes the comparison less accurate.

The density of power for local hotspots in your writeup is very interesting (denser in places than I would have expected) - thanks for that! I'm curious if GAAFET helps transmit heat between transistor/areas better than FINFET and how that compares to Planar. Is that sort of implied in the data there, or too many variables because of top / bottom cooling and insulation?
On die-sensors can be pretty accurate (a degree or two) especially if calibrated at production (some are) -- but they only measure the temperature where they are... :-(

Modern CPUs have lots of sensors all over the place, and in many cases can do individual throttling/dynamic supply voltage control for different blocks, but this completely depends on the design -- in a lot of cases two blocks which communicate have to run at the same clock rate. ASICs which just continuously stream data at a fixed rate (like comms devices) can't use this trick, they have to run at a fixed rate/voltage.

The plots I showed are for circuit/block level self-heating which are similar for all FSPD processes (but BSPD is a lot worse). On top of this you have device-level SHE which is within individual transistors, especially if made out of multiple stripes as many are -- for example, in this case the middle gates tend to run hotter than the end ones. You can reduce this effect by splitting the transistors up into paired gates with dummies in between or even individual gates, but this increases area and parasitic capacitance (meaning, power consumption) so it's not a free lunch. BSPD has the additional problem that lateral heat-spreading away from hot transistors is also not as good as FSPD, so these gate-level temperature differences are also bigger.

GAAFET has worse local hotspots (transistor level) then FinFET because the thermal connection to the substrate is poorer. FinFETs are a bit better but still considerably worse than planar devices due to the tall thin fins, especially if the PMOS uses SiGe fins which have lower thermal conductivity than silicon, which is a nasty surprise if you're not expecting it -- we saw PMOS SHE that was about twice as big as NMOS. GAAFET are worse still, especially NMOS (the gap to PMOS is smaller since there are no SiGe fins).

The power densities I quoted are not unusual for high-speed circuits, or heavily-loaded ones like clock drivers -- in the worst cases we've seen even higher numbers, to the point where there was no choice other than to divide the circuit up into multiple parallel copies and spread these out, even though this was undesirable (high-speed VCO)... :-(

Of course this doesn't apply to *all* chips, but the TSMC recommendation for A16 is telling: "Suitable for HPC devices with dense power grids and active cooling"... ;-)
 
Last edited:
Thanks @IanD, this also explains a lot about the hurdles Intel faces in Foundry vs. also serving it's own needs..

It seems a little bleak that many of the new technologies (new transistor types, BSPD), combined with lack of SRAM scaling lately seem to be decreasing the reusability of newer nodes for multiple product segments.. also reducing the cost benefit of (any remaining) scaling.

..

FWIW, the 100W/mm2 you're working with is pretty far into "insane engineering territory". Below is W/cm2.. the very top of the chart is what you described :).

1771525037751.png
 
Back
Top