Weekly Review: Volume 9, Issue 15, April 3, 2013
By Charles King, Pund-IT, Inc.
The problem with “blanket” generalities is that, while they can initially seem warm and comforting, they often lead toward fuzzy thinking. More importantly, though generalities can sometimes express a concept for which no formal framework exists, commercial exposure and lack of central control opens them to any interested party. In other words, if an idea isn’t clearly stated and understood to begin with, it isn’t unusual if interested parties define it on terms beneficial to themselves.
That’s hardly surprising, but what does it have to do with „big data?’ Remember that attention to the subject of big data began to coalesce as studies like “The Expanding Digital Universe” (2007, updated in 2011—by IDC, the University of California and EMC) explored the challenges and opportunities organizations faced in capturing full value from their evergrowing storage infrastructures and investments. The tangible connection between expanding information assets and big data was in the nature of the information itself.
As “The Expanding Digital Universe” study noted, some 80% of the information created by average organizations is unstructured or semi-structured data of the sort that can‟t be contained, mined or analyzed by conventional relational databases. So it was natural that startups like Greenplum, Cloudera and Splunk, which develop solutions leveraging Hadoop, NoSQL and other emerging analytics technologies appropriate for non-traditional data sets became big data trendsetters.
But there are obvious problems in this paradisiacal view. First, increasingly unmanageable, even metastatic growth isn’t impacting unstructured and semi-structured information alone—it’s affecting virtually all data in virtually all organizations. Second and perhaps most importantly, enterprises trying to get their arms around big data (metaphorically speaking) are on the hunt for ways to holistically manage and analyze information across their entire organizations, not looking to create yet more costly, isolated data siloes.
IBM’s New Big Data Solution: An Evolutionary Leap?
That is why we are so intrigued by IBM’s evolving big data portfolio and its new/enhanced solutions. In short, the company announced:
- New DB2 with BLU Acceleration—leverages innovative 10X compression and “data skipping” technologies to practical use, supercharging traditional data warehouse and business intelligence performance. As a result, IBM reports speeding the reporting and analytics performance of its DB2 10.5 database solutions by 8X-25X, all without changes to indexes, aggregates, tuning, or SQL/schemas.
- Enhanced Big Data Platform—improves the capabilities and ease of use of IBM‟s InfoSphere BigInsights and InfoSphere Streams. This was achieved by adding an ANSI SQL interface to BigInsights (allowing clients with existing SQL skills/applications to get up and running faster), as well as new GPFS and high availability features critical for enterprises. In addition, the new Streams 3.1 boosts performance by 2X-10X and offers features that simplify development, deployment and integration processes.
- New PureData System for Hadoop—is designed to make life easier for customers that want to leverage Hadoop. IBM achieves that by pre-integrating its Netezza-based BigData solutions at the factory, reducing deployment times by up to 8X (compared to custom built solutions). IBM also claims the new PureData System is the first Hadoop appliance with built-in analytics and accelerator technologies, and the only such solution with built-in archiving tools.What does IBM hope to achieve with these new solutions? Certainly, they represent an evolutionary step forward for IBM’s analytics and information management solutions and services. In fact, it is hard to remember an announcement that brought so many enhancements and new capabilities to that portfolio. But we also believe these new offerings qualify as tools businesses can utilize to spark their own evolution and success.
This all sounds pretty impressive but how are IBM’s new and enhanced solutions likely to fare in what is, admittedly a crowded, bustling big data market? That needs to be considered on a case by case basis. In the traditional enterprise database market, IBM is obviously aiming at Oracle and while the potential performance of DB2 with BLU Acceleration is remarkable, inspiring migrations from one database to another is a hard sell.
IBM has some notable weapons for such efforts, particularly its Oracle to DB2 migration services (which claim up to 98% of code can be migrated without rewriting). In the short term, the new solution is likely to primarily benefit current IBM customers but if DB2 with BLU Acceleration truly delivers its promised goods we expect it should draw significant interest among Oracle clients.
The other two solutions reflect broader industry trends. There is certainly no shortage of
Hadoop-based appliances. That said, IBM’s core Netezza data warehouse technology is well-established and understood in this market so extending the platform’s use cases with a Hadoop distribution makes perfect practical and strategic sense.
Enhancing InfoSphere BigInsights with an ANSI SQL interface is another sensible move, as it should help customers interested in big data get additional mileage from their already sizable SQL investments. However, simplifying InfoSphere Streams and boosting performance by 2X-10X is likely to increase interest in the somewhat arcane subject of real time big data analytics.
Proponents often describe big data as the ‘next big thing’ in IT. That is, it’s a technology so deeply attuned to the market’s current state and emerging needs that it will inspire a transitional evolutionary leap forward analogous to previous transitions from mainframes to clients/servers to PCs, to the Internet to cloud computing. Could that be the case? I believe so.
Fundamentally, big data technologies arose as the volumes of information organizations
create/store—particularly unstructured and semi-structured data—became increasingly unwieldy and problematic. Early solutions were specifically designed to address just those problems.
But truly effective big data technologies and strategies must extend beyond that material
to encompass the structured information stored and analyzed in traditional relational databases. In fact, unless solutions can effectively address and analyze all of an organization’s information resources, users risk creating little more than new classes of costly, inefficient information siloes. In such cases, so-called big data solutions will deliver little more than tactical “hops” rather than the transformative “leaps” they promise.
The essence of IBM’s trio of big data announcements—the new DB2 with BLU Acceleration, InfoSphere BigInsights and Streams enhancements, and new PureData System for Hadoop—demonstrates its three dimensional (3D) view of big data. In contrast to some competitors, the company believes big data isn’t some new issue requiring emerging or arcane technologies. Instead, IBM views big data as a fundamental challenge that stretches across the IT landscape, tangibly affecting the technology market as a whole and businesses of every sort and size. By successfully developing innovative big data solutions across their various spheres of influence, IBM helps its customers pursue and ensure their own success.