Buy American – The Path to Next-Gen 100 Pflop/s Supercomputers

By Charles King, Pund-IT, Inc.  April 15, 2015

Intel’s announcement last week of its selection by the Department of Energy (DoE) to build a next-generation supercomputer for the Argonne National Lab is a bellwether in U.S.-sponsored high performance computing (HPC). The event follows announcements last November that the DoE had chosen IBM, NVIDIA and Mellanox to build supercomputers for the Oak Ridge and Lawrence Livermore Labs.

All of these systems are part of the DOE’s multimillion dollar CORAL initiative to build state-of-the-art “leadership” systems that are five to seven times more powerful than its current best supercomputers and will be available for use in the 2017-18 timeframe. All are designed to deliver over 100 petaflops (Pflop/s) in peak performance or about twice that of the world’s current top-ranked supercomputer, the Tianhe-2 (MilkyWay-2) system at the National Super Computer Center in Guangzhou, China.

The “Aurora” system awarded to Intel will consist of 50,000 nodes powered by 3rd gen Intel Xeon Phi microprocessors, 2nd gen Intel OmniPath interconnects and the Intel Lustre file system. Planned performance will be 180 Pflop/s (with an option to increase performance to 450 Pflop/s) while consuming 13 MW of power. That’s 18X better performance and 6X better energy efficiency than the Argonne’s current MIRA system. Aurora will be built in conjunction with Intel’s strategic partner, Cray, which will provide its next gen “Shasta” supercomputing architecture, a scalable software stack, development and manufacturing expertise and onsite support.

The “Summit” system at Oak Ridge and the “Sierra” system at Livermore will be hybrid supercomputers with 3,000+ nodes combining next gen IBM POWER CPUs, next gen NVIDIA GPUs based on the company’s Volta architecture and its NVLink interconnect technology, and a state-of-the-art interconnect incorporating built-in intelligence that IBM is implementing with Mellanox. The result will be what IBM said are “data centric” systems boasting peak performance well in excess of 100 Pflop/s that are capable of moving masses of data to the processors at more than 17 petabytes per second.

Next gen evolution for HPC

The roles played by next generation technologies are obviously critical in all these systems, but their development methodologies and broader implications are even more profound. For example:

  • All are being built collaboratively, a stark contrast to traditionally monolithic supercomputing installations. Aurora certainly qualifies as a showpiece for Intel’s next gen HPC technologies, but Cray is an ideal collaborator to develop, manufacture and deploy the new system. IBM, NVIDIA and Mellanox are all members of the OpenPOWER Foundation, which was created to explore opportunities around IBM’s open source Power architecture. In fact, it could be argued that the Summit and Sierra design wins are a direct result of the relationships forged in OpenPOWER.
  • All will offer more than double the performance of current state of the art supercomputers.
  • Along with delivering significantly higher performance than the systems they will replace, all will do so while consuming significantly less energy.
  • All focus great attention on the role data movement (related to fabric and interconnect technologies) increasingly plays in supercomputing. Supercomputing is no longer a matter of making systems that are phenomenally fast, so much as it is building solutions for effectively crunching vast stores of data that are getting ever vaster.

Finally, and perhaps more importantly, consider the commercial implications of these new systems. The half a billion dollars and change that the DoE is dedicating to Aurora, Summit and Sierra isn’t a paltry sum, but none of the involved vendors will get rich from the deal. But supercomputing is becoming increasingly mainstream for a growing number of businesses, and we expect all of these systems to yield information and insights that will help their respective vendors to explore and enjoy future commercial market opportunities.

Supercomputing and national pride

It’s also worth considering whether or how these new systems indicate any underlying changes in what might be called the culture of supercomputing. The highest levels of compute performance have always engendered national pride of one sort or another. But the rise of industry standard technologies, including Intel’s Xeon microprocessors, have enabled virtually every country with technological aspirations to jump on the supercomputing bandwagon.

The results, as displayed biannually in the Top500.org list of the world’s best performing supercomputers has been increasingly international in recent years. In fact, while Top500 lists were long dominated by well-known government research facilities and universities in the U.S., Europe and Asia recent lists have included a host of new institutions and commercial companies, and China’s Tianhe-2 installation has topped the past three lists. The CORAL initiative will certainly replace aging systems at three of the DoE’s key national laboratories, but they should also land U.S. systems and facilities at or near the apex of global supercomputing when they finally come online.

Final analysis

Delivering world class supercomputing is a statement of technological leadership and national pride but it also awards participating vendors bragging rights that can result in billions of dollars in sales. As supercomputing and HPC increasingly become part of mainstream business IT, the commercial value of projects like the DoE’s CORAL initiative will also increase. That’s great for the vendors involved, as well as the IT industry as a whole but it also has broader implications.

More and more, IT concerns itself with problems that once would have been considered unsolvable for various reasons – size, complexity, varieties of information and sources, and sheer masses of data. At the furthest reaches of those sorts of problems, supercomputers are delivering increasingly valuable, actionable solutions today. What the DoE’s upcoming Aurora, Summit and Sierra intend is that innovative collaborations among forward-thinking IT vendors will help ensure that there will also be answers to complex problems tomorrow.

© 2015 Pund-IT, Inc. All rights reserved.