You are currently viewing SemiWiki as a guest which gives you limited access to the site. To view blog comments and experience other SemiWiki features you must be a registered member. Registration is fast, simple, and absolutely free so please, join our community today!




Results 1 to 3 of 3

Thread: ? on Microsoft and Nvidia HGX-1 Platform for AI Machine Learning

  1. #1
    Expert
    Join Date
    Apr 2013
    Location
    East SF Bay Area
    Posts
    1,106
    Thumbs Up
    Received: 305
    Given: 268

    ? on Microsoft and Nvidia HGX-1 Platform for AI Machine Learning

    With Microsoft and Nvidia using open source for progress, is this likely to be a platform that yields significant results or more a PR excise to intimidate or scare off competition? Any thoughts or comments on this would be appreciated. Could this become the standard for AI/Machine Learning or are there any other platforms that could dominate this area. Microsoft in the past has been able to set the standard, even when it isn't even near the best option, could this happen again?

    https://www.forbes.com/sites/patrick...g#31d28b0b4d59

    0 Not allowed!
     

  2. #2
    Member
    Join Date
    Sep 2014
    Posts
    22
    Thumbs Up
    Received: 36
    Given: 8
    Let's try to approach this top-down, it might make more sense:

    In 2011, Facebook 'open sourced' it's hardware design in the 'Open Compute Project' (OCP) [1]. I'd describe the idea of OCP to create an open ecosystem with LEGO (R)-like hardware, where the sizes and interfaces of the LEGO (R) building blocks are free to use for everybody, without the risk of infringing patents.

    The OCP is organized in different groups, for example data center design, where airconditioning and the like ar standardized, a storage group, but also the OCP Server Project. The OCP server project has, as its deliverables, specification standards for electrical interfaces, mechanical interfaces, maneagability, debug & test framework [2].

    One of the designs that has been contributed to the OCP server project, by Microsoft in Nov '16, is 'Project Olympus'. The goal of Olympus is to explore, invent, collaborate, enhandce and make solutions using Olympus modular building blocks [3], so basically to create 'server-LEGO'. LEGO-parts may be rack, rack manager, server case, power supply, motherboard (that's where Foxconn comes in I assume), firmware, firmware and software API's.

    Let's look at the Project Olympus partners, in order of appearance in their blog [4]:
    1) Intel, one of the five founding members of OCP, has Olympus LEGO-blocks forXeon Processors of the Skylake generation, Altera FPGA's and Intel Nervana (AI) solutions. Please note, Microsoft has been doing research with FPGA's on improved search algorithms, greatly speeding up 'search times' by orders of magnitude, and not by the several percents improved CPU's would normally reach [5].
    2) AMD brings the upcoming 'Naples' to the table. This is great for AMD, as it seems they can offer Microsoft plug&play LEGO-blocks with their new server-CPU's by the time their server CPU's arrive!
    3) Qualcomm & Cavium, a.k.a. the ARM-server gang. There are demonstrations with both Qualcomm Centric and Cavium ThunderX2 [6]. Note that Qualcomm also has an AI platform, named 'Zeroth'. I'm not sure if it's in their Centriq family for servers, but it's part of their Kyro-SoC's for consumer hardware [7].
    4) And eventually, late to the party, wait for it.... Nvidia! Of course, they couldn't stay behid. Ingrasys is the partner here, which comes as no surprise seeing Quanta (also a server-builder, but mostly famous as laptop ODM for Dell, HP et all) is also part of project Olympus. So NVidia entering OCP and contributing to Project Olympus is not something extraordinary, it would be extraordinary if they didn't contribute to OCP. Also part are most of your usual suspects Broadcom, Samsung, Marvell, Mellanox, TI, HP & Dell.

    So, what is HGX-1? Microsoft describes it as "a hyperscale GPU accelerator chassis for AI". Ah, so it's merely an open spec for a chassis! That's all there is to it, and that's all that's "open". Probably, it's not even designed by NVidia themselves, they're not chassis designers. At least they didn't do the design without help from Foxconn and Microsoft. I wouldn't be surprised if NVidia was asked. So, what can you put in this 'open spec' chassis? Well, you can put NVidia "Pascal" cores in it, which use 'proprietary / closed' NVLink high speed multi-GPU interconnect, pretty much an "NVidia / IBM only technology" [11]. And since IBM is not involved and probably Microsoft Azure doesn't run on Power ISA anyways, I'd say it's NVidia only. Of course, you program these Pascal-cores in CUDA, also a 'closed source proprietary NVidia only' technology, though CUDA is provided as freeware and there are documents on CUDA available [10].

    So, back to your question. "With Microsoft and NVidia using open source for progress". In your words, that would be a PR-excercise. HGX-1 is an "open chassis spec" running proprietary closed software and hardware. Open source progress would be, if NVidia joined CCIX [8], Gen-Z [9] or abandoned CUDA in favour of OpenCL, and if Microsoft open sourced their Windows server platform. However, both in CCIX and Gen-Z, NVidia would have to compete with both ARM, AMD and Xilinx.

    Is HGX-1 going to take over the world? Clearly, that's unlikely. With LEGO, you have your red, yellow, blue and green LEGO brick. Once designed, did the green LEGO brick take over the world? Of course, it has been popular, red bricks can be ditched in favour of green bricks. But on the other hand, due to being LEGO, green bricks could just as easily be ditched for red bricks, if red bricks proved better and cheaper. That's one of the best parts of open source eventually, being able to exchange whole subsystems for other subsystems. Once AI algorithms 'stabilize', somebody could desigen an AI ASIC which is better at specific AI tasks than GPU's. Put that ASIC in the yellow LEGO-brick, make it a family of Project Olympus, and Microsoft could swap from GPU to ASIC almost overnight. Besides, HGX-1 is part of Project Olympus, and Olympus is clearly aimed at making life better for Microsoft. What does that say when it comes to NVidia being able to serve Google's, Baidu's or Tencents needs? Not that much, after all.

    I hope, this is enough background info to decide the answer for your own; I think my expectations are clear.

    [1] About >> Open Compute Project
    [2] Server >> Open Compute Project
    [3] Server/ProjectOlympus - OpenCompute
    [4] Ecosystem momentum positions Microsoft’s Project Olympus as de facto open compute standard | Blog | Microsoft Azure
    [5] Microsoft Supercharges Bing Search With Programmable Chips | WIRED
    [6] https://azure.microsoft.com/en-us/bl...ns-in-silicon/
    [7] https://www.qualcomm.com/invention/c...chine-learning
    [8] http://ccixconsortium.org/
    [9] http://genzconsortium.org/
    [10] http://www.nvidia.com/object/cuda_home_new.html
    [11] https://blogs.nvidia.com/blog/2014/1...hat-is-nvlink/

    1 Not allowed!
    Last edited by hkwint; 04-29-2017 at 05:52 AM.
     

  3. #3
    Expert
    Join Date
    Apr 2013
    Location
    East SF Bay Area
    Posts
    1,106
    Thumbs Up
    Received: 305
    Given: 268
    hkwint, thanks for the detailed and excellent explanation

    0 Not allowed!
     

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •