IBM BigInsights: Refining Open Source Big Data for Enterprises

By Charles King, Pund-IT, Inc.  June 29, 2016

I’ve written pretty regularly about IBM’s history of promoting open source technologies. Those efforts began with the company’s support of Linux in the 1990s, resulting in IBM supporting Linux distributions across its entire portfolio of systems and other solutions.

Over the years, the company also contributed and invested extensively in a wide range of existing and emerging open source projects. Plus, it open sourced some of its own home grown technologies, including the Eclipse development platform and the Power microprocessor architecture.

But it’s also worth considering the tactical purpose behind those efforts, and the resulting benefits that have accrued to IBM and its customers and partners. That’s especially true considering the company’s recently announced IBM BigInsights 4.2 platform and the related IBM Big Replicate solutions.

Lending open source a touch of enterprise class

How important was IBM to open source? In the mid- to late-1990s, the public face of Linux was dominated by gearheads who proposed the fledgling OS as a replacement for Windows. Why so? Microsoft was beset by self-inflicted anti-trust problems, making it appear uniquely vulnerable. Plus, Apple’s rebirth under the recently (1997) returned Steve Jobs was anything but certain.

In other words, there was probably never a better time for an alternative desktop OS to make a play. Problem was that the concept promoted by Linux evangelists was a longshot of delusional proportions—a bit like a neighborhood hardware shop offering itself as a global alternative to Home Depot. But there was another viewpoint that realized the fundamental value Linux offered in the data center.

Linux was platform agnostic, meaning that it could replace or displace costly proprietary OSs on most any server platform. The collaborative/community orientation of open source also meant that development processes and costs were often lower than standalone projects. Finally, many younger developers preferred open source. In other words, if you wanted to connect with budding, next-gen IT leaders, you needed a presence in Linux.

Linux was also a prime example of the “When disruption is inevitable, it’s better to disrupt than be disrupted” mantra. IBM understood that point in full and, despite some resistance, methodically supported Linux internally and promoted it externally.

The decision to first develop Linux tools for the venerable IBM mainframe platform (then zSeries, now z Systems) was a brilliant strategy that validated the OS for enterprise organizations and executives that likely considered Linux (if they considered it at all) to be a repository for IT cranks and business misfits.

Today, Linux accounts for over half of the z System MIPS that IBM sells annually, and open source efforts inform and invigorate numerous other company solution and service offerings. In essence, IBM gave open source a touch of enterprise class that has paid off hugely for the company, customers and countless partners and developers.

IBM BigInsights V4.2

So what does all this have to do with IBM’s BigInsights announcements? A couple of things. First and foremost, BigInsights 4.2 is the latest complementary addition to the IBM Open Platform (IOP), a big data platform built on 100% open source Apache ecosystem components, including Hadoop and Spark, that is certified by the Open Data Platform initiative (ODPi).

In BigInsights 4.2 built on IOP 4.2, IBM has created a highly efficient, secure and complete Hadoop/Spark/SQL solution that is ready for enterprise customers. IBM BigInsights 4.2 can also be utilized internally by advanced analytics and data science teams and on external cloud platforms, meaning it can effectively support those or hybrid cloud deployments.

The addition of IBM Big Replicate expands enterprises’ control over data being analyzed on- and off-premises. It supports critical storage features, including continuous availability, streaming backup and uninterrupted migration. Data consistency is guaranteed across clusters any distance apart, so customers can meet highly demanding SLAs. But Big Replicate is also totally non-invasive and requires no modification to Hadoop source code.

Final analysis

The IBM Open Platform with Apache Hadoop and Spark was originally released in 2015 so some will say that IBM BigInsights 4.2 on IOP 4.2 is simply an update of obviously complementary technologies. But I’d argue that something more important is occurring.

Yes, IBM BigInsights 4.2 and Big Replicate significantly enhance and enlarge IOP 4.2 which is already one of the industry’s most expansive big data platforms. But they also highlight how the company is making regular, significant extensions to and investments in IOP. For example, the company’s Ranger technology (announced last November) supports provide secure central policy administration for authentication and authorization, critical points for developers working with IOP and Hadoop.

Another example, IBM Graph (introduced in February) is designed to enable customers to move information in existing databases into graph databases for complex relationship querying and mapping. Described by the company as “the industry’s only enterprise-grade graph database-as-a-service,” IBM Graph can be used for applications, including modelling and applying social networks, and constructing/developing recommendation engines.

In essence, IBM BigInsights 4.2 on IOP 4.2 provides additional, compelling evidence that IBM is doing with Apache Hadoop, Spark and related technologies what it previously did with Linux and other open source solutions; make them more usable, valuable and relevant for enterprise customers.

By enhancing Hadoop and Spark’s readiness for its core clients and their use cases, IBM is bringing open source big data to the head of the enterprise class. Similar efforts around Linux nearly two decades ago had a significant impact on the company and its customers and partners. It seems reasonable to assume that the results of IBM’s open source big data efforts will be equally or even more profound.

© 2016 Pund-IT, Inc. All rights reserved.