June 2018: The Difference Six Months Can Make in Supercomputing

By Charles King, Pund-IT, Inc.  June 27, 2018

The IT industry is used to rapid fire changes, rising/falling fortunes and unusual market developments. However, it’s difficult to think of a sector where changes bordering on the surreal occur more often than in top-end supercomputing and high-performance computing (HPC).

Such shifts are clearly apparent in the latest list of the world’s currently fastest supercomputers, which also celebrated the Top500 group’s 25th anniversary. Since the last Top500 report (November 2017), some remarkable shifts have taken place. Those include an all new #1 supercomputing installation, the emergence of a new Top500 list leader and the apparent collapse of a vendor that has led the list for half a decade.

IBM regains leadership with Summit and Coral

The new list’s biggest bragging rights involve the ascendency of IBM’s Summit installation at the DOE’s Oak Ridge Lab to the peak of the Top500 list. As I noted in a recent Pund-IT Review, Summit contains 4,356 nodes, each equipped with two 22-core IBM Power9 CPUs and six NVIDIA Tesla V100 GPUs linked with a Mellanox dual-rail EDR InfiniBand network.

Summit’s performance of 122.3 petaflops (on High Performance Linpack (HPL), the benchmark used to rank the TOP500 list) is about a third faster than the 93.015 petaflops achieved by the Sunway TaihuLight system (at China’s National Supercomputing Center in Wuxi). That installation has stood atop the Top500 since it arrived on the June 2016 list.

More importantly, IBM has noted that Summit’s hybrid Power9/Tesla GPU architecture enables the system to augment conventional supercomputing workloads with machine learning applications, substantially broadening the range of projects and data that Summit can support. That same hybrid architecture (though with different IBM Power System servers) was used in the new #3 system on this Top500 list: the Sierra installation at the DOE’s Lawrence Livermore National Laboratory.

Taken together, Summit and Sierra also enabled IBM to leap to the front of the pack in terms of overall Top500 performance, moving from 19 systems delivering a total of 51.275 petaflops in November 2017 to 18 systems delivering 239.067 petaflops today. The #2 vendor in overall performance is Cray whose 53 Top500-listed systems deliver a collective 187.798 petaflops.

Interestingly, another IBM system is also the oldest installation in the Top 10 group: the Sequoia installation at DOE’s Lawrence Livermore Lab which leverages the company’s BlueGene/Q architecture. It originally topped the Top500 List in June 2012, and its 17.17petaflops of performance make it #8 on the current list. That’s just behind the second oldest system in the Top 10: the Cray-based Titan at Doe’s Oak Ridge which pushed Sequoia out of the top spot in November 2012.

Lenovo rises, HPE falls

It’s hardly a secret that vendors use placement on the Top500 list for bragging rights so it’s not unusual to see some tussling for leadership positions. Those range from total number of listed systems to total performance to most energy efficient systems (in the Green500 list) to placement on the relatively new High-Performance Conjugate Gradients (HPCG) benchmark list, a Top500 project designed to create a new metric for ranking HPC systems.

The emergence of new leading-edge systems is to be expected given the amount of national pride (and funding) that goes into government-sponsored supercomputing installations. Plus, HPC exhibits the same sort of competitive give/take that you see in virtually every commercial IT market. Less common are wholesale shifts in leadership positions, including the total number of Top500-ranked systems. However, that’s exactly what the new list revealed, with Lenovo’s 117 installations handily bypassing HPE’s 79 systems.

How unusual is this? Consider that prior to this, HPE (then HP) held the topmost position in system share since the June 2013 Top500 list when the company bumped IBM out of the top spot. In turn, IBM booted HPE out of the leadership role in the November 2010 list. In other words, there isn’t a lot of turnover in that #1 position.

The other notable point is the dramatic drop in the number of listed systems—from 122 in the last list to today’s 79 that HPE suffered in the past six months. It’s no secret that the company seems to be shifting its go-to-market strategy towards emerging areas, like edge computing and IoT, and away from hyperscale and HPC. Or it may simply be a matter of HPE’s customers lagging the pace of market evolution.

For example, the EPFL Blue Brain IV (IBM BlueGene Q) system at the Swiss National Supercomputing Centre, was ranked #372 last November. It’s 715.6 TFlop/s of performance earned it the #500 spot on the new list.  Plus, HPE has tended to be stronger in commercial HPC solutions that populate the lower end of the Top500 list. In fact, that was one of the drivers for the company’s 2016 purchase of HPC-focused SGI.

However, those points don’t detract in any way from Lenovo’s accomplishment. Along with leading in total systems on the new list, the company’s solutions captured two positions in the top 25, five in the top 100 and thirty-nine of the top 200. Lenovo is also responsible for some notable global HPC installations, including Mare Nostrum at the Barcelona Supercomputing Center, the Niagara system that qualifies as Canada’s largest supercomputer and the Marconi system at Cineca in Italy which is among the world’s most energy efficient supercomputers.

Increasing energy efficiency is also the focus of Lenovo’s new Neptune initiative which the company announced at the International Supercomputing Conference (ISC, where the new list debuted). Neptune encompasses Lenovo’s suite of liquid cooling technologies, including Direct to Node (DTN) warm water cooling, rear door heat exchanger (RDHX) and hybrid Thermal Transfer Module (TTM) solutions, which are designed to deliver peak or high performance HPC, AI and enterprise workloads.

Final analysis

So, what’s the takeaway from these details gleaned from the latest list? First and foremost, that innovation continues to be alive and well in HPC. That’s hardly headline news given the size of the budgets allocated for these projects. But the fact is that the Top500 list has long served as a marker for compute technologies and capabilities that eventually work their way into commercial markets. Consider, for example, how the IBM Power Systems AC922 servers and other components driving Oak Ridge’s massive Summit system are available for purchase today.

In addition, like those same commercial markets, the new list shows that competition in the HPC continues to be fierce. Both IBM and Lenovo deserve congratulations for their new leadership positions and the innovations they delivered along the way. But at the same time studying past Top500 results underscores just how tenuous technical dominance can be.

Finally, it’s important to remember that despite Top500’s focus on pure performance, the systems on this newest list, as well as those on previous lists were built to take on and complete some of the world’s hardest and most complex computing tasks. The capabilities and insights they deliver can and do make real differences in the lives of people, communities, countries and the larger world.

IBM and Lenovo’s achievements aren’t the only significant stories coming out of the new Top500 list. There are also the various issues driving the need for and development of the new HPCG standard. Then there’s China’s continuing reign as the country with the largest number of Top500-listed systems. Finally, consider what the rapid rise of youthful vendors, like Inspur means to mainstream HPC vendors.

But at the same time, it’s worth remembering that supercomputers like IBM’s Summit and Sierra, and Lenovo’s Mare Nostrum don’t just run at unbelievable speeds. They run toward and help HPC customers achieve critically important goals.

© 2018 Pund-IT, Inc. All rights reserved.