DATA TO DECISIONS

Doug Henschen on Analytics, Big Data & Smart Apps

Hadoop Enters Awkward Teenage Years

Hadoop Summit Event Report: Hortonworks highlights technology progress and growing enterprise adoption amid growing pains and intra-Hadoop-family conflict.

Hadoop is enterprise viable and growing quickly. That’s the message Hortonworks CEO Rob Bearden communicated from the start of his opening keynote at the June 9-11 Hadoop Summit in San Jose, CA. Billing Hadoop as “one, central platform for batch, interactive, and real-time applications,” he focused mainly on the big-picture possibilities for innovation and “game-changing new business models.” But he also acknowledged “incredible progress” in Hadoop basics such as operational performance, security and data governance over the past 18 months.

Progress certainly has been made, but Hadoop is best described today as entering its awkward teenage years. Its (analytical) voice is breaking and it sometimes throws tantrums when faced with adult (enterprise workload) expectations. Talking to customers here (running on Hortonworks and Cloudera), I heard tales of cluster crashes and cranky, uncooperative operational behavior.

Hortonworks CEO Rob Bearden kicks off Hadoop Summit 2015.

Hortonworks CEO Rob Bearden kicks off Hadoop Summit 2015.

Hortonworks introduced the 2.3 release of the Hortonworks Data Platform (HDP) at Hadoop Summit. But executives didn’t spend much keynote time talking about the improvements to Ambari systems management, project Ranger security and access controls, or Apache Atlas metadata management. The list of upgrades also includes broader SQL coverage, visualization of SQL queries, and easier installation and configuration of HDFS, Yarn, Hive and Hbase.

To discuss such basics would only underscore the relative immaturity of Hadoop. Instead Bearden and others focused on support for cutting-edge options like Apache Spark and streaming analysis with Kafka and Storm. All three were featured in on-stage demos.

Thankfully, Hortonworks also didn’t beat its chest in public about the Open Data Platform (ODP) initiative. Announced in February, this is the partnership led by Hortonworks, IBM, Pivotal and Infosys and since joined by Telstra, BMC, DataTorrent, Syncsort, Unifi, zData, and Zettaset. The goal is to get behind a stable core of Apache Hadoop components that all members use to promote interoperability and to speed adoption of Hadoop.

ODP members insist that they have invited all Hadoop distributors to join the group, but the starting-point was agreeing to interoperate with HDFS, YARN, MapReduce and Ambari. Ambari is the sticking point, as Cloudera and MapR have their own management software, so they’ve declined to join ODP. The choice of Ambari gets back to Hortonwork’s “100% open source” ethos, but this sort of intra-Hadoop-family drama also does not inspire confidence in Hadoop. In my view, Hortonworks and other ODP partners would do well to pursue this initiative from a purely technical perspective rather than using it as some sort of branding seal of approval.

As the proud parent of a maturing Hadoop distribution, Hortonworks was smart to put the public emphasis at Hadoop Summit on enterprise adoption. More than 75 Hortonworks customers presented at the event. Luminaries included Progressive insurance, oilfield services firm Schlumberger, online auto buying site TrueCar, telco giant Verizon, and web-measurement firm Webtrends. Progressive is analyzing all that Progressive Snapshot driving data IoT style, helping good drivers to save money on car insurance. TrueCar studies everything from the color of a car to localized buying trends to accurately price vehicles. In a real-time scenario, Webtrends can spot abandoned shopping carts and lost shoppers on e-commerce sites within seconds so retailers can respond and try to save the sale before customers leave the site.

MyPOV on Hortonwork’s Progress

Hortonworks is nothing if not consistent. It sticks to its mission of delivering 100% open source Hadoop software. In some cases it gets to features and functions after chief rival Cloudera has developed something first. Ranger, for example, provides access controls that Cloudera previously introducing in Sentry. And Cloudbreak, the cloud-deployment tool/service introduced in HDP 2.3, follows in the footsteps of Cloudera Director, introduced last fall.

First does not always mean best. Ranger, for example, provides more granular access control (and auditing) than does Sentry across all the components of Hadoop. Given the impressive and growing customer list, it’s clear plenty of companies like Hortonworks’ approach and are confident in the roadmap.

As for the some of the latest features from Hortonworks and competitors including MapR, let’s hope routine system-admin and security features won’t be what ultimately differentiates Hadoop distributions. These are the sorts of features that enterprise customers just expect to be there. When Hadoop is truly mature, the boundaries between the menagerie of projects within Hadoop will disappear. The complexity that still confronts Hadoop administrators and day-to-day users will diminish. And the competition will center on ease of data management, ease of workload management, breadth of analytical capabilities, and emerging, next-generation big data applications.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Information

This entry was posted on June 10, 2015 by in analytics, big data platforms, streaming data and tagged , , , , .
%d bloggers like this: