Microsoft reveals second generation of its AI chip in effort to bolster cloud business

blueone · 2026-01-31T10:44:12-0800

KevinK said:
There’s a big difference between “collecting dust” and “GPUs being underutilized”. Most existing AI data centers are significantly underutilized: GPUs are often busy well under two‑thirds of the time, and overall facility capacity is even less efficiently used

- A 2024–2025 large‑scale AI infrastructure survey reports that over 75% of organizations see peak GPU utilization below 70%, meaning most accelerators sit idle a substantial fraction of the time even at “busy” periods.
- Industry practitioners estimate effective model FLOPs utilization (MFU) for many LLM fine‑tuning workloads in the roughly 35–45% range, implying that much of the theoretical compute capacity in installed GPUs is not being turned into useful training.

But the main cause is inefficient use of GPUs, not lack of usage.

- Common causes include suboptimal job scheduling, fragmentation across many small teams, over‑provisioning against peak demand, and software bottlenecks (I/O, networking, data pipelines) that stall GPUs.
- As a result, even though organizations experience GPU *scarcity* and keep ordering more hardware, the prevailing view in recent analyses is that most AI GPU fleets are materially underutilized rather than consistently running near full capacity.

So there’s lots of room for improvements leveraging data center level co-optimization and management, that better match the loads to the GPU resources.

GPU supply doubled, but AI teams still starved. Why? - LinkedIn https://www.linkedin.com/posts/nehi...-why-are-ai-activity-7389840480945205249-B5-n

This makes sense. We could measure the utilization of many different kinds of accelerators, and we'd find that the accelerators are at a high level poorly utilized. Mostly because they're only doing a portion of the work, and the portion they work on appears to be episodic from the perspective of the accelerator. In the case of GPUs, however, the accelerators are the most expensive and most power-hungry portion of the workload, so we pay more attention to them more than we do, for example, network interface accelerators or storage accelerators. It sounds like the end-to-end processing architecture in AI targeted hardware with specialized accelerators, especially GPUs, needs a global redesign for the next generation.

Paul2 · 2026-01-31T10:46:00-0800

KevinK said:
But the main cause is inefficient use of GPUs, not lack of usage.

Maybe... Many many companies built whole on premise private datacentres just to host a website, ERP, and a few file servers. Corporate IT massively overbooks hardware almost as a rule.

blueone · 2026-01-31T19:32:31-0800

Paul2 said:
Maybe... Many many companies built whole on premise private datacentres just to host a website, ERP, and a few file servers. Corporate IT massively overbooks hardware almost as a rule.

What evidence do you have that this is the case? If anything, US companies are in a mad rush to move to cloud computing. I don't know much about internal datacenters in Asian and European companies.

Search

Microsoft reveals second generation of its AI chip in effort to bolster cloud business

blueone

Well-known member

Paul2

Well-known member

blueone

Well-known member