You are currently viewing SemiWiki as a guest which gives you limited access to the site. To view blog comments and experience other SemiWiki features you must be a registered member. Registration is fast, simple, and absolutely free so please, join our community today!




Results 1 to 7 of 7

Thread: Google TPU - Die size, tech node, foundry?

  1. #1
    Member
    Join Date
    May 2016
    Posts
    1
    Thumbs Up
    Received: 0
    Given: 0

    Google TPU - Die size, tech node, foundry?

    Google's TPU - Tensor Processing Unit - surprising reveal yesterday. Any one have thoughts on die size and tech node? The size of the heat sink from the EE Times story seems to suggest about 3cm x 4cm heat sink which is approx 1200 sqmm. Is there a ratio like 10% from the size of heat sink to actual die size to estimate die size? What foundry is making this chip? How many TPUs can you slot in a 1RU server rack - say 2-socket Xeon? Seems like SATA connections are being used?

    0 Not allowed!
     

  2. #2
    Top Influencer
    Join Date
    Dec 2013
    Location
    EU
    Posts
    346
    Thumbs Up
    Received: 126
    Given: 31
    Google's Tensor Processing Unit (TPU) fits in a hard-drive slot of a server.
    Since it is a custom ASIC chip design, I would exclude Intel as a manufacturer. Bad news also for NVIDIA I would say (even if it is not sure Google has any plan to make this chip available on the market) .
    Technology node? Most likely 20-22nm, based on the Google statement that they have been running TPUs inside their data centers for more than a year (for that I would exclude 14nm).
    Foundry? No idea, my bet is IBM at the moment.
    Oh God, Skynet is coming :-)


    0 Not allowed!
     

  3. #3
    Blogger Daniel Payne's Avatar
    Join Date
    Sep 2010
    Location
    Tualatin, OR
    Posts
    2,956
    Thumbs Up
    Received: 273
    Given: 390
    The technology could even be the more cost effective 28nm node, since very few customers used the 20nm node. We kind of have to wait for Google to open up their story a bit more to better understand their specifications.

    0 Not allowed!
    Daniel Payne, EDA Consultant
    www.MarketingEDA.com
    503.806.1662

  4. #4
    Expert hist78's Avatar
    Join Date
    Jan 2014
    Location
    Chicago
    Posts
    546
    Thumbs Up
    Received: 270
    Given: 259
    So, can we logically predict that in-house designed SoC/processors by Facebook, Amazon, or even Microsoft are coming into their data centers or assembled inside their products very soon? Then there is no reason Apple won't develop their own SoC/processors for Apple's servers and Macs.

    This is really an exciting moment. But it's a bad news for Intel.

    0 Not allowed!
     

  5. #5
    Top Influencer
    Join Date
    Aug 2013
    Posts
    118
    Thumbs Up
    Received: 22
    Given: 5
    Tensors isn't a trade you can learn many places. My experience is where I was the people that taught it didn't know what they were doing. Selected applications it will be incredible but the general public doesn't want it.

    0 Not allowed!
     

  6. #6
    ippisl
    Guest
    I've seen research talking about doing neural networks in analog 130nm chips, and they get really good power consumption numbers, they could fit Google's 10X perf/w improvement. There are even research on using 40nm analog, to get another 10x perf/w.

    Don't know if Google went there, but it fits their brand of r&d.

    0 Not allowed!
     

  7. #7
    Influencer
    Join Date
    Feb 2016
    Posts
    53
    Thumbs Up
    Received: 35
    Given: 37
    Interesting paper:

    ISCApaperv3 (2).pdf - Google Drive

    The TPU project actually began with FPGAs, but we abandoned them when we saw that
    the FPGAs of that time were not competitive in performance compared to the GPUs of that time, and the TPU could be much
    lower power than GPUs while being as fast or faster, giving it potentially significant benefits over both of FPGAs and GPUs.
    note: Today FPGAs are better compared to FPGAs of that time, but today GPUs are also much better (Nvidia is talking about 10x improvements on specific cases).

    Catapult V1 runs CNNsóusing a systolic matrix multiplieró2.3X as fast as a 2.1 GHz, 16-core, dual-socket server
    [Ovt15a]. Using the next generation of FPGAs (14-nm Arria 10) of Catapult V2, performance might go up to 7X, and perhaps
    even 17X with more careful floorplanning [Ovt15b]. Although itís apples versus oranges, a current TPU die runs its CNNs
    40X to 70X versus a somewhat faster server (Tables 2 and 6). Perhaps the biggest difference is that to get the best
    performance the user must write long programs in the low-level hardware-design-language Verilog [Met16][Put16] versus
    writing short programs using the high-level TensorFlow framework. That is, reprogrammability comes from software for the
    TPU rather than from firmware for the FPGA.

    0 Not allowed!
     

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •