You are currently viewing SemiWiki as a guest which gives you limited access to the site. To view blog comments and experience other SemiWiki features you must be a registered member. Registration is fast, simple, and absolutely free so please, join our community today!

  • ARMs in the Clouds

    The most interesting session at the Linley Tech Data Center Conference last week was the last one, on Designing Power Efficient Servers. What this was really about was whether ARM would have any success in the server market and what Intel's response might be.

    Datacenters are now very focused on power efficiency and many track Power Usage Efficiency (PUE) which is the ratio of the power used by the servers to the power used by everything else (routers, cooling, power distribution, backup etc). 2 is average and new facilities target 1.5. Power is generally the limiting factor on the size of a rack and the size of a datacenter so further improvements are required. More than a third of the cost of ownership of a datacenter is proportional to the electrical usage. So despite the obvious issues with a change of architecture (porting software), if big savings can be made they can be truly compelling.

    Historically, server processors focus on complex highly-superscalar CPUs designed for the best possible single thread performance. But all that instruction reordering wastes power as does very long pipelines and high clock rates. For many datacenters focused on heavy computation this is the right type of server. But many other datacenters are focused on highly threaded workloads that can easily take advantage of more cores per chip/server/rack. There are also opportunities to integrate high-speed I/O and networking all on the same chip.

    The obvious beneficiary of this means of thinking is ARM. They announced the 64-bit Cortex-A57 (no, I don't understand ARM's numbering system either) focused on this opportunity. Intel has responded with Centerton which is their first server processor based on Atom. But, as Linley pointed out, it only has the same level of integration as Xeon and so requires external USB, Ethernet, disk controllers etc. TDP is 6.3W at 1.6GHz which looks nice until you consider that its performance is so much lower than Xeon and, in fact, it is less power efficient than Xeon. Intel's next generation will be Avoton (no, I don't understand Intel's naming system either) in 22nm with second generation Atom architecture and "integrated system fabric." But details have not yet been announced.

    So who are working on alternatives? Tilera has repurposed their massively multicore processor for cloud servers. Calxeda is shipping an ARM-based server processor. AppliedMicro and Cavium are developing 64-bit ARM CPUs and AMD has announced that it will use the Cortex-A57 in 2014 server products.

    Calxeda presented their roadmap. Today they can put 3,000 servers in a single rack with 12,000 cores, 12TB DRAM, power requirements down by 90%, eliminating 9 miles of cabling and 125 ethernet switches. That's the sort of thing that will get the attention of Google, Facebook and Amazon.

    They had an interesting example: server capacity to service 10,000,000 HTTP requires per second on a 1Gb network infrastructure. The densest x86 solution requires 1997 servers on 4 racks with 44 switches and consumes 37kW. Using Calxeda's ARM-based SoCs, this is 1535 servers on 1.6 racks with 2 switches and 13kW of power. 40% lower TCO, 61% less space, 95% fewer switches, 65% less energy. Close to their elevator pitch: 1/10th of the energy in 1/10th of the space and 1/2 the TCO and all the performance.

    AppliedMicro had a similar message. There are computationally intensive workloads such as high-frequency trading or data-mining. But many cloud workloads are not like that and can take advantage of lots more cores even if the scalar performance of each one is not the maximum. Their X-Gene microserver integrates network, I/O and storage all on one chip.

    They have their comparison example too. A traditional server architecture providing 2560 cores requires 160 nodes, 28kW in 2 racks. With X-Gene, the same 2560 cores just requires 320 nodes, 19kW in 1 rack. So half the size and half the power.

    The server processor market is about $10B today so it is a prize worth fighting over. And, unlike the smartphone processor market, the major players are not designing their own SoCs so the whole market is available for merchant suppliers.