Array
(
    [content] => 
    [params] => Array
        (
            [0] => /forum/threads/6-graphs-that-show-where-the-u-s-leads-china-on-ai%E2%80%94and-where-it-doesn%E2%80%99t.24509/
        )

    [addOns] => Array
        (
            [DL6/MLTP] => 13
            [Hampel/TimeZoneDebug] => 1000070
            [SV/ChangePostDate] => 2010200
            [SemiWiki/Newsletter] => 1000010
            [SemiWiki/WPMenu] => 1000010
            [SemiWiki/XPressExtend] => 1000010
            [ThemeHouse/XLink] => 1000970
            [ThemeHouse/XPress] => 1010570
            [XF] => 2030871
            [XFI] => 1060170
        )

    [wordpress] => /var/www/html
)

6 Graphs That Show Where the U.S. Leads China on AI—and Where It Doesn’t

Daniel Nenni

Admin
Staff member
Two important things happened on January 20, 2025. In Washington, D.C., Donald Trump was inaugurated as President of the United States. In Hangzhou, China, a little-known Chinese firm called DeepSeek released R1, an AI model that industry watchers called a “Sputnik moment” for the country’s AI industry.

“Whether we like it or not, we're suddenly engaged in a fast-paced competition to build and define this groundbreaking technology that will determine so much about the future of civilization,” said Trump later that year, as he announced his administration’s AI action plan, which was titled “Winning the Race.”

There are many interpretations of what AI companies and their governments are racing towards, says AI policy researcher Lennart Heim: to deploy AI systems in the economy, to build robots, to create human-like artificial general intelligence.

"I think in most metrics, the U.S. is clearly leading,” he says. But Heim notes that getting a clear picture of AI progress and adoption is challenging: “The best metrics are the numbers we don't have.”

These six graphs show where the U.S. is ahead of China, what’s driving that lead—and why it could be tenuous.

1770648046896.png


“Right now, compute is arguably the single biggest driver of AI progress,” says Daniel Kokotajlo, executive director of the AI Futures Project, a research group that forecasts the future of AI progress, referring to the computer chips used to train AI models.

That’s bad news for Chinese firms, which have been limited in their access to compute—the chips used to train and run AI models—since 2022, when the Biden administration restricted the export of the advanced manufacturing equipment used to produce the chips, and then the chips themselves in 2023.

“Money has never been the problem for us; bans on shipments of advanced chips are the problem,” said Liang Wenfeng, CEO of DeepSeek in July 2024.

However, export rules announced in January by the Trump administration could give Chinese companies access to 890,000 of Nvidia’s H200 AI chips—more than double the number of chips that Chinese manufacturers are expected to produce in 2026, according to a report by the Center for a New American Security.

“Limited access to advanced chips has been the primary constraint on China’s AI progress. The new export rule will significantly boost China’s AI capabilities,” Janet Egan, one of the report’s authors, told TIME. “The U.S. is essentially equipping its leading strategic competitor.”

It remains to be seen whether the Chinese companies will be able to take advantage of the newly available chips—Chinese customs officials initially blocked imports of the chips, according to reports.

“China has a lot of incentive to look like it might be blocking chips, both in terms of its relationship with Chinese tech companies, because it wants to force them to buy domestic chips, and in terms of its relationship with Washington, because it wants to make Washington think that it doesn't need U.S. chips,” says Chris Miller, author of Chip War, a bestselling history of the semiconductor industry.

1770648149079.png


The success of DeepSeek’s R1 model was a sign of what can be achieved by a talented team with limited resources. A Stanford analysis found that more than half of the researchers responsible for the breakthrough “never left China for schooling or work,” challenging “the core assumption that the United States holds a natural AI talent lead.”

China produces far more top AI researchers than the U.S., according to an analysis of authors at NeurIPS, a top AI conference. Many of them end up working in the U.S., but the share working in China more than doubled between 2019 and 2022, and a new $100,000 price tag on visas for foreign talent may further “hurt the innovation and competitiveness of the U.S. industry,” Subodha Kumar, a professor at the Fox School of Business at Temple University, told TIME last year.

1770648185098.png


AI training is incredibly power-hungry. U.S. AI companies have been falling over each other to secure contracts with energy providers.

Chinese AI companies have a significant advantage in this regard. China has produced more energy than the U.S. since 2010. “Of all the key inputs into AI, energy is the one where the U.S. is least competitive,” says Miller.

For now, China’s AI development is bottlenecked by its lack of AI chips, but if its stock increases—either through relaxed export controls of American chips, or through increased domestic production—the country’s ready access to energy could be critical.

1770648237418.png


For the time being, America’s control of AI chips and larger share of top talent has allowed it to produce the world’s most capable large language models (LLMs). Chinese LLMs have lagged behind American models by seven months on average, according to Epoch AI, an AI research company.

Moreover, Chinese models’ competitiveness might be partly due to “distillation,” where developers use outputs from more capable models to train their own models, says Heim. Some users reported that Chinese firm DeepSeek’s model said that it was “ChatGPT, a language model developed by OpenAI,” when asked to identify itself.

“Without distillation, I expect the gap in AI model performance would be bigger,” Heim told TIME.

1770648261462.png


“Revenue is people paying for things they find useful,” says Miller. “The best metric, I think, of AI deployment is the revenue that accrues to AI products.”

Alibaba—which makes the Qwen series of models, among the most capable coming out of China—is publicly traded, and therefore is one of the country’s few AI developers that also publishes revenue figures.

However, developing Qwen is a side hustle for the company’s Cloud Intelligence division, which is the largest provider of web services in the country, making the group’s revenue an upper bound on the money that the company makes on its AI models.

Even so, it’s a figure that American AI startups—founded at least six years later and concentrated solely on AI development—are approaching rapidly. In September, Alibaba Cloud posted an annualized revenue of $22 billion. Two months later, OpenAI’s CFO Sarah Friar wrote that OpenAI had exceeded $20 billion.

 

Attachments

  • 1770648092630.png
    1770648092630.png
    145.5 KB · Views: 16
Seems like the Chinese models have done us a real service in the west by giving us tutorials on how to build more efficient models, and consequently more power and cost efficient hardware. 2024 was all about fitting full large, dense frontier transformer models into AI processor memory. 2026 is all about MoE‑first, multimodal, long‑context transformers, run on strongly disaggregated inference stacks (prefill/decode, or encode–prefill–decode for multimodal) with lots of scheduling and KV engineering on top. A big part of that pivot is due to DeepSeek shock staring in early 2025.

1. Architecture: MoE‑first, multimodal, long‑context​

  • Text‑only and multimodal frontier models now largely share a unified backbone: a transformer that handles long text context and multimodal tokens (images, sometimes audio/video) in one shared attention space, rather than separate towers.kenhuangus.substack+2
  • At the high end, these backbones are almost always sparse Mixture‑of‑Experts in the decoder: total parameters in the 100B+ range with only a few billion active per token, letting them reach frontier quality at lower per‑token cost than similarly strong dense models.huggingface+3
  • Multimodal variants add encoders for images/audio/video that map raw inputs to token streams; those encoders are usually dense and relatively compact, while the big MoE action lives in the autoregressive decoder that fuses everything and generates text (and sometimes tokens that reference images or video regions).cvpr.thecvf+2

2. MoE trends: experts for reasoning and modalities​

  • For text‑dominant frontier models, MoE experts are increasingly specialized by skill (code, math, instruction‑following, multilingual, tool‑use), with routers tuned to maintain load balance and avoid dropping experts; this is where much of the “economics of inference” work is happening.signal65+2
  • In multimodal models, the same MoE idea is extended to modality‑aware experts: some experts are better for dense visual grounding, others for long‑range textual reasoning, and some for audio/temporal patterns, with routing informed by both token type and global context.arxiv+2
  • The upshot: MoE plus specialization lets a single frontier model act like a collection of domain‑specific models without paying full dense cost, and this is becoming the default design for top reasoning and multimodal systems.blogs.nvidia+3

3. Disaggregated inference: from PD to EPD​

  • For text‑only models, inference is now explicitly split into prefill (prompt ingestion, compute‑heavy) and decode (token‑by‑token generation, memory/bandwidth‑heavy); many systems run these on different GPU pools or hardware types to maximize utilization.arxiv+3
  • Recent work shows that neither “aggregate everything” nor “fully disaggregate” is always best, so schedulers are moving toward hybrid policies that can treat prefill and decode as partially shared or partially separate, depending on SLO and load.arxiv+2
  • For multimodal models, the same idea generalizes to Encode–Prefill–Decode (EPD): encoders for images/audio/video run in their own pool (large‑batch, throughput‑oriented), then a prefill stage fuses tokens, then decode runs on memory‑optimized, KV‑rich nodes; frameworks like EPD serving and HydraInfer formalize this.openreview+1

4. Systems stack: rack‑scale, MoE‑ and modality‑aware​

  • The system focus has shifted from “just bigger data/tensor parallel” to “rack‑scale orchestration of experts, KV, and PD/EPD phases”: schedulers understand expert parallelism, KV cache placement/migration, and the different resource profiles of encode, prefill, and decode.redhat+3
  • Specialized prefill/decode hardware proposals (and GPU scheduling policies) assume this disaggregated world and try to overlap prefill and decode as streaming pipelines, especially for multimodal workloads where encoders can be heavily batched.arxiv+2
  • In practice, production stacks for frontier LMMs look like: a front‑end router → multimodal encoders pool → prefill MoE pool → decode MoE pool, with KV handoff between pools and autoscaling tuned separately for P‑heavy vs D‑heavy vs encode workloads.arxiv+3

5. Training/inference tricks layered on top​

  • Multi‑token prediction and related objectives are being used in both text‑only and multimodal transformers to improve long‑range reasoning and planning while lowering effective decoding cost per token, and they fit naturally with MoE decoders.arxiv+2
  • Token merging and sequence‑reduction techniques (especially for images and long contexts) are increasingly used to keep unified multimodal transformers tractable: fewer but more informative tokens feed into the same MoE backbone.magazine.sebastianraschka+2
  • Combined with low‑precision formats and aggressive KV engineering, these methods act as horizontal multipliers across dense and MoE models, but are particularly valuable for the MoE‑heavy, multimodal, long‑context frontier.icml+3
 
Last edited:
Slightly OT - I wonder if certain written/spoken languages are more efficient or better served by LLMs than other langauges.. i.e. Does AI handle Chinese better than English? reverse?
 
Back
Top