IBM, NVDIA, Oak Ridge Labs and the Summit of Supercomputing

By Charles King, Pund-IT, Inc.  June 13, 2018

Supercomputers and other top-end high performance computing (HPC) installations have long defined and delivered the bleeding edge of compute performance. However, the underlying systems in those projects often reflect and portend broader changes in the commercial IT marketplace.

That was certainly the case during the steady move away from the proprietary technologies and highly customized systems that once ruled supercomputing toward servers leveraging Intel and AMD x86 CPUs and other Industry Standard components. As a result of those changes, supercomputing and HPC have become increasingly affordable and available for mainstream use cases.

A similar fundamental shift is relevant to the new Summit installation revealed this week by the Department of Energy’s (DoE’s) Oak Ridge Laboratory and IBM which now qualifies as the world’s leading supercomputer. Let’s take a closer look at that announcement.

Summit by the numbers

So exactly what is Summit and why is it so special? As noted in the Oak Ridge Labs announcement, Summit consists of 4,608 IBM AC922 servers, each containing two 22-core IBM POWER9 processors and six NVIDIA Tesla V100 GPU accelerators interconnected with dual-rail Mellanox EDR 100Gb/s InfiniBand. In addition, Summit possesses over 10 PB of memory paired with fast, high-bandwidth pathways for efficient data movement, and 250 PB of high performance IBM software-defined storage.

The combination of POWER9 CPUs, NVIDIA Tesla GPUs and other high performance components and subsystems qualifies as an evolution of the hybrid CPU–GPU architecture successfully pioneered by the 27-petaflops Titan supercomputer Oak Ridge deployed in 2012, but what a difference six years can make.

Today, Summit can achieve peak performance of 200,000 trillion calculations per second—or 200 petaflops. That’s 8X faster than Titan and about 60% faster than the 125 petaflops delivered by what was until now the world’s fastest supercomputer: the Sunway TaihuLight installation at China’s National Supercomputing Center in Wuxi.

More importantly, Oak Ridge and IBM designed Summit to integrate traditional HPC scientific discovery workloads and artificial intelligence (AI) functions, making it not only the world’s leading supercomputer but the first capable of supporting successful exascale (exaops) scientific calculations.

That capability was tested by an Oak Ridge team that used Summit for a comparative genomics calculation previously run on Titan. The results? Summit’s 1.88 exaops performance delivered identical results but in a fraction of the time required by Titan.

The rise and triumph of hybrid supercomputing

As noted by both Oak Ridge and IBM, in addition to the scientific modeling and simulations that constitute many supercomputer-based research projects, Summit’s ability to integrate AI and scientific discovery will enhance numerous experiments and open the door to entirely new lines of inquiry, including;

  • Astrophysics – Summit can supply clues related to how heavy elements—including the gold in jewelry and iron in blood—seeded the universe by simulating supernova scenarios several thousand times longer and with 12 times more elements than past projects.
  • Cancer surveillance – By automatically extracting, analyzing and sorting health data in medical images and text-based reports, then pairing it with machine learning algorithms, Summit will help supply researchers with a comprehensive view of the U.S. cancer population at a level of detail typically obtained for clinical trial patient groups.
  • Materials science – Summit can help spur research in next generation materials, including the search for practical superconductors that transmit electricity with no loss of energy. Previously, researchers have been limited to simulating tens of atoms because of high computational costs but Summit can support materials composed of hundreds of atoms.
  • Systems biology – Using a mix of AI techniques on Summit, researchers will be able to study genetic and biomedical datasets to identify patterns in the function, cooperation and evolution of human proteins and cellular systems. That can give rise to clinical phenotypes, including the traits of diseases like Alzheimer’s, heart disease or addiction, and help inform drug discovery processes.

Final analysis

As stated in the Summit announcement, the hybrid architecture leveraging IBM’s POWER processors and NVIDIA’s Tesla GPUs was utilized in the Titan system. Generational improvements, along with the development of other enhanced and new technologies also contributed to the performance improvements Summit is delivering, along with the new ability to supercharge scientific research with AI capabilities.

But it’s also worth noting the value that other efforts contributed to the Summit effort. Chief among them was IBM’s decision to open source its POWER processor architecture and launch the OpenPOWER Foundation in 2013. IBM, NVIDIA and Mellanox (along with Google and Tyan) were all founding members of OpenPOWER and continue to help lead the organization and its 325+ current member companies. In other words, it’s easy to see how the partnerships and collaborations encouraged by OpenPOWER also helped spur and inform the developments leading to Summit.

In addition, other commercial solutions contributed to the project in meaningful ways, like the IBM Storage technologies that underly Summit’s 250PB of capacity and support its remarkable 2.5TB read/write capabilities. It should also come as no surprise that along with Summit’s launch, IBM announced that customers are already using its hybrid CPU/GPU Power Systems AC922 solutions to support business workloads and use cases.

In point of fact, the old saying that “what goes around, comes around” can be applied to supercomputing and HPC. The latest systems and installations often gain substantial new features with the inclusion of next generation technologies. However, in the case of Oak Ridge’s Summit and IBM, what goes around is coming around far more quickly than ever before. That includes the broad commercial availability of once-difficult to obtain technologies and the achievement of once-impossible to imagine computing capabilities.

© 2018 Pund-IT, Inc. All rights reserved.