On Analytics, Data Platforms and Smart Applications
Hadoop security, data management, data governance and analysis options remain works in progress, but a rich ecosystem is emerging to fill gaps and democratize the platform.
Apache Hadoop marks its 10th anniversary as an open source project this year, a fitting milestone to review its (betwixt-and-between) state as an enterprise computing platform.
Inspired by a Google white paper, born at Yahoo and embraced in its early years almost exclusively by Internet giants, Apache Hadoop is today accepted as a de facto standard platform for any enterprise interested in taking advantage of big data. Over the last five years, the top three Hadoop software distributors, Cloudera, Hortonworks and MapR, have cracked all major vertical industry categories and have collectively gained more than 3,000 paying customers for their supported enterprise editions. Tens of thousands more firms are self-supporting free community distributions of Hadoop, though the largest share of these deployments are no doubt about experimentation rather than production use.
Equally significant – and now the fastest-growing part of the Hadoop user community, by most accounts – are the thousands of organizations using cloud-based Hadoop services, such as Amazon Elastic MapReduce, Microsoft Azure HDInsight, Altiscale, Qubole and various managed Hadoop service offerings.
Looking beyond these sheer numbers, I heard plenty of fresh evidence of proven industry use cases at recent Cloudera and Hortonworks analyst events. Cloudera detailed an impressive list of vertical industry use cases at its event while Hortonworks cited unnamed customers at “55 out of the top 100 financial services firms, 75 out of the top 100 retailers, eight out of the top nine telecommunications companies in North America, and eight of the world’s top 20 automotive companies.”
So there’s plenty of reason for confidence in this platform, and we continue to see steady maturation. But Hadoop still has weaknesses and gaps, and plenty of experiments have failed. Even the hand-picked customers attending the Cloudera and Hortonworks events, who shared mostly success stories, had to admit to ongoing challenges:
MyPOV on Hadoop Maturity
So is the glass half empty or half full? In my view you should be optimistic but realistic about this ten-year-old platform. I relate it to my experience as a parent. We never left my son home alone when he was 10 years old, but now that he’s 14, I trust that he’ll be safe and will even get his homework done if we get home late from work. In much the same spirit, an executive at a major e-retailer shared in a recent briefing that his firm isn’t ready to open up wide access to the firm’s Hadoop cluster until data-access, governance and security controls are more mature. Maybe if PCI data wasn’t involved he’d feel differently? Just as a parent has to know the child, you have to understand your data, your users and your risks. Maturity and trust will come.
Fortunately, we’re seeing a rich ecosystem emerging around Hadoop that will help make data access, data management, data governance and data analysis easier, less coding intensive, more repeatable and, in many cases, more accessible to business users. Some of these capabilities will undoubtedly be duplicated within open source tools. But we’ll also see data-management and governance capabilities that will extend beyond Hadoop, supporting data pipelines and data-driven applications that span multiple platforms.
Next week I’ll be discussing the possibilities and positive developments in the educational webinar, “Democratizing the Data Lake: The State of Big Data Management in the Enterprise.” Set for Tuesday, April 26 at 1pm ET/10am PT, this webinar will delve into data access, data cataloging and metadata management options for Hadoop as well as big data integration and data-prep options. We’ll also discuss Apache Spark and its role in data processing, stream processing and data analysis in the context of Hadoop. Click on the link above to register for the event.