IBM’s Bayesian Optimization Accelerator and the Journey to Commercial Viability

By Charles King, Pund-IT®  December 16, 2020

How complex computing technologies become commercial solutions is often unclear. In most instances, launch announcements mark the first time the public hears about such products. However, that is not always the case. For example, the appearance of IBM’s recently introduced Bayesian Optimization Accelerator can be tracked over other company breakthrough innovations. Following that trail offers some insights into how what some people might consider an obscure technology can become a commercially viable product.

What is Bayesian optimization?

Originated by the Reverend Thomas Bayes in 1763, Bayes theorem determines the probability of an event based on knowledge of conditions that might influence or be related to the event. As noted in Stephen Meserve’s post on IBM’s new solution, Bayesian methods are commonplace in mathematics but applying standard solutions, like Monte Carlo search, to product design problems is often challenging or impractical.

With that in mind, IBM designed the Bayesian Optimization Accelerator to find optimal solutions for real-world design challenges in less time and with fewer resources than other solutions. It can scale to orders of magnitude larger number of dimensions and tackle highly complex problems. Plus, IBM’s solution can determine design points with a smaller number of samples than other methods require, delivering results faster and more cost-effectively.

However, developing the Bayesian Optimization Accelerator as a commercial solution required the efforts of several IBM teams and business units.

Hardware foundation – IBM’s POWER9-based AC922

Introduced late in 2017, Power Systems AC922 was the first commercial IBM server to utilize the company’s new POWER9 processors, as well as NVIDIA Tesla Volta 100 (V100) GPUs. The AC922 also incorporated features developed for the Summit and Sierra supercomputers deployed at the U.S. Department of Energy’s Oak Ridge Laboratory. The Summit system also led the Top500.org top-performing supercomputer lists from June 2018 until June 2020.

As noted in a blog by Ron Gordon at the time of the launch, the “AC” designation in the AC922 stands for Accelerated Computing because of the performance of two POWER9 CPUs, I/O bandwidth, and memory bandwidth and up to four NVIDIA GPUs. The AC922 is particularly well suited for Artificial Intelligence applications, including machine learning and deep learning, using Linux and frameworks like Torch and Caffe.

The AC922 continues to deliver superb performance for AI workloads and provides hardware foundation for the new Bayesian Optimization Accelerator.

IBM Research’s Bayesian optimization help drive Power innovations

In June 2020, IBM discussed advances achieved by a team of engineers in its High-Speed Bus Signal Integrity (HSB-SI) organization. Implementing IBM Bayesian optimization software, a machine learning tool developed by IBM Research, the team was able to dramatically reduce the number of simulations required to reach the optimal configuration for chip-to-chip communications.

The legacy “brute force” processes that are typically used to analyze chip-to-chip design channels are engineering- and simulation-intensive and take days to arrive at an optimal combination. By running IBM Research’s software on a Power Systems server, the HSB-SI engineers were able to dramatically cut the amount of time required to deliver the same results and used far fewer compute resources to get there.

I wrote about the team’s achievement earlier this year. Using a 10-core Power System server with IBM Bayesian optimization software reduced the compute time required for one job from nearly eight days to 80 minutes. In circumstances where results were required to be delivered in 100 minutes, a Bayesian optimization-enabled Power System server with 9-cores successfully completed that task while brute force techniques required a system with 1,126 cores to achieve the same results.

Enter IBM’s Bayesian Optimization Accelerator

The new IBM solution is a dedicated Power Systems appliance optimized for accelerating Bayesian search calculations. The appliance’s minimum technical requirements include an IBM Power Systems AC922 with dual POWER9 CPUs, 256GB of memory, two NVIDIA V100 GPUs, two 1.6 TB NVMe SSDs and two 1Gb Ethernet ports. Software requirements include RHEL 8, CUDA 11 and ESSL 6.3. Appliances can also be configured to meet specific technical and design requirements.

Key features include enabling task parallelization to reduce CPU and wall clock time, scaling to orders of magnitude more dimensions than conventional open source Bayesian libraries, determining design points with fewer samples than methods, like Grid and Random search require, and ensuring traceability to models to build trust in model design methodologies. IBM also notes that the new appliance offers improved throughput and is easy to add to existing HPC clusters.

Final analysis

While some products appear to spring fully formed from the minds of inventors and developers, far more evolve and are assembled from various, often disparate individual and team efforts. IBM’s new Bayesian Optimization Accelerator is clearly among this latter group. The core Power Systems AC922 platform was enabled by IBM’s new generation POWER9 CPUs, the company’s efforts in designing the Summit and Sierra supercomputers and its strategic partnerships with NVIDIA, Mellanox and others.

On the software side, the new solution owes much to the work of IBM Research developers. By their efforts and in concert with IBM Power Systems hardware, the company’s Bayesian optimization software massively improved the speed and efficiency of what had been highly complex and resource-intensive design processes.

Today, companies have access to commercial iterations of what were once speculative IBM projects. As those organizations put the Bayesian Optimization Accelerator to work, its journey is likely to be eclipsed by the new and unique destinations it helps IBM customers achieve.

© 2020 Pund-IT®. All rights reserved.