IBM Storage – Speeding and Simplifying the AI Journey

By Charles King, Pund-IT Inc. July 15, 2020

Artificial intelligence (AI) projects can incorporate a wide variety of computing and data storage technologies and services. However, enterprises hoping to use AI to gain or add value to their businesses must be especially vigilant when it comes to planning and enabling these projects. Commercial solutions vary significantly in terms of quality and performance. More importantly, however, decision makers are often uncertain about what is required to ensure the success of AI projects.

The recent announcement by IBM about its new and updated storage solutions optimized for AI underscores this point by citing a recent Forrester survey of global IT, data and line-of-business decision makers which found that over half of the respondents didn’t know what their AI data requirements are. IBM has addressed this challenge by developing an information architecture (IA) that is designed to help customers effectively collect and organize data, gain deeper insights via AI-enabled data analysis and then use those insights to enhance business outcomes.

The company calls the process by which companies leverage the power of AI the “AI journey,” and IBM’s new storage solutions are clearly designed to help customers embark on and complete that journey. Let’s take a closer look at them.

Building bigger, better data lakes

Data lakes came to the fore in the early days of “big data” solutions as valuable alternatives to traditional data warehouses. A key advantage of data lakes is their ability to accommodate semi-structured and unstructured data; audio/video streams, call logs, click streams, sentiment data and social media data. Supporting larger varieties of data enables more robust AI model training with the goals of increased accuracy and reduced bias.

To that end, IBM’s new Elastic Storage System (ESS) 5000 is designed to support the data collection and long-term storage requirements of massive data lakes. The ESS 5000 leverages IBM Spectrum Scale, the company’s global file system for advanced management of unstructured data at scale with the ability to perform archive and analytics processes on data in place. The ESS 5000 also leverages IBM’s POWER9 silicon, delivering robust 55GB/s performance in a single 8-disk enclosure node. Plus, systems are scalable to enormous yottabyte (one billion petabytes) configurations.

The ESS 5000 follows-on to the NVMe all-flash ESS 3000 solution that IBM introduced last October which is designed for more modest data lake environments and data analysis. The ESS 3000 is also powered by IBM Spectrum Scale and comes as a scalable 2u building block solution delivering 40 GB/s performance. IBM noted that the ESS 3000 and ESS 5000 systems are optimized for different stages of AI journeys and that both can be in the same Spectrum Scale-based data lake. The two together highlight the company’s recognition that there is no single path or approach to AI, and its intention to support customers’ AI needs on-premises, at the edge and in the cloud.

Optimizing data performance for hybrid cloud

Along with using freshly created and collected information in AI initiatives, many companies are working to incorporate archived historical data to enhance training outcomes. IBM has made improvements to its Cloud Object Storage (COS) which is designed to provide cost-efficient on-premises and hybrid cloud object storage. These solutions can potentially enable systems to support faster AI data collection and integration with AI, big data and HPC workflows.

IBM has completely modernized the COS storage software engine with an upgrade designed to increase system performance to 55 GB/s in a 12-node configuration. That can improve reads by up to 300% and writes by up to 150% (depending on object size) over prior generation solutions. IBM COS systems will also support new high-performance Shingled Magnetic Recording (SMR) drives which can support up to 1.9PB of storage in a 4u disk enclosure. Along with these new enhancements, Spectrum Scale can also minimize duplicate copies when moving data from object storage environments.

For performing deeper analysis of data assets, companies can utilize IBM Spectrum Discover for file and object metadata cataloging and indexing. Spectrum Discover performs ingest and export on heterogeneous storage systems, such as EMC Isilon and NetApp filers, as well as with IBM Watson and IBM Cloud Pak for Data solutions. With Spectrum Discover’s API interface, AI, big data and analytics software applications can leverage the Spectrum Discover metadata catalogs and indexes. IBM also noted that Spectrum Discover will soon be incorporated into Red Hat OpenShift environments, enabling it to be portably and flexibly deployed across private and public clouds, and in any environment supported by OpenShift, as well as being deployed in virtual machines.

Final analysis

Though robust computing technologies play obvious critical roles in AI initiatives and strategies, flexible, scalable storage solutions are equally important. Whether an AI project focuses on contemporaneous data collected from business systems and sensors or leverages historical information to enhance the accuracy of AI training models, success depends on the ability to effectively collect, organize and analyze those data assets.

Just as no two organizations are completely alike, the ways that businesses complete their AI journeys are also similarly unique. Before embarking on those efforts, companies would do well to consider the data and AI systems they plan to use and the vendors upon whose guidance they will rely. IBM’s latest storage solutions, including the new Elastic Storage System (ESS) 5000 and upgraded Cloud Object Storage engine, along with Spectrum Scale and Spectrum Discover demonstrate the company’s continuing dedication to enabling and ensuring the success of its enterprise customers’ AI journeys.