Doug Henschen on Analytics, Big Data & Smart Apps
Strata + Hadoop World announcements by Cloudera, IBM, Google and SAP anticipate cloud growth. Here’s why cloud will be so crucial even if data remains on premises.
“Today, 92% of all IT is happening on premises,” said Mike Olson, Cloudera’s chief strategy officer, in his September 28 keynote kickoff at Strata + Hadoop World in New York. “But we see dramatic growth — nearly 35% compound annual growth — over the next 10 years in public cloud workloads.”
Even in the big data arena, where so much volume (in the form of data warehouses) remains on premises, practitioners and vendors alike are looking to the cloud. Why? Because so much of what will drive innovation, differentiation and value is going to happen in the cloud. That vision was underscored at Strata + Hadoop World, where Cloudera set the tone and IBM, Google, SAP and others added to list of announcements that centered on cloud deployment options.
Cloudera highlighted recent cloud-friendly enhancements including the ability to run the Impala database for Hadoop against cloud-native object stores such as Amazon S3 (with Azure Data Lake likely to follow). It’s also previewing a connector for Microsoft Azure users so they can use PowerBI with Impala. The company has simplified use of its Cloudera Director cloud-management tool by adding templates for deploying on AWS, Azure and Google. This underscores the company’s commitment to portability across the top three public clouds. In July the company also introduced consumption-based subscription terms, with the ability to meter hourly usage of its Hadoop software for temporary, project-based workloads.
IBM announced IBM DataWorks, a cloud-based platform with options for data ingestion, persistence and analysis. Data sources can be on-premises or in the cloud, structured or unstructured, and batch oriented or streaming. Options to persist the data include relational database services, NoSQL database services and IBM’s BigInsights Hadoop distro as a service. IBM Watson services and Apache Spark-based machine learning services support data processing and data discovery. DataWorks offers user interfaces for data engineers (DataWorks Connect), data scientists (with Jupyter notebooks and RStudio as part of Data Science Experience), business analysts (Watson Analytics) and app developers (BlueMix services). It’s largely a packaging of existing cloud services, but what makes it a platform is shared data and metadata access and governance and shared spaces for data modeling and analysis.
Google announcements are always about cloud, of course, but at Strata + Hadoop the company reached out to the enterprise crowd with Big Query for Enterprise. Aimed at mainstream corporate use, Big Query for Enterprise add support for standard SQL (SQL 2011, specifically), including the ability to update, delete and insert rows and columns in BigQuery datasets using SQL. The offering also adds new ODBC drivers, for connecting to popular BI tools, and new access and identity management capabilities. Finally, pricing options include monthly flat-rate pricing, aimed at lowering the cost of long-term use, versus short-term pricing aimed at ephemeral projects.
SAP confirmed its rumored acquisition of Altiscale last week, and it said the business will continue to offer its high-performance Hadoop and Spark services as a separate business unit. Altiscale execs said they expect to expand their business now that they have SAP’s backing and can take advantage of the company’s data center capacity in Europe and elsewhere. As I explained in this take on the deal (before it was confirmed), Altiscale will enable SAP to offer Hadoop and Spark capacity alongside Hana and Hana Vora on the Hana Cloud Platform. Thus, SAP won’t have to turn to other vendors when customers require cloud-based big data infrastructure as part of their next-generation data-driven applications.
MyPOV On Steps Toward the Cloud
All of these moves are positive, but they’re merely next steps toward supporting the sort of hybrid-cloud deployment scenarios that companies will want and need in the years ahead. It’s great that Cloudera has made it easier to deploy its software on the three leading public clouds, but many companies struggle with the complexities of deploying and running Hadoop. They want fully managed services, so they can click to deploy without having to deal with administering the cluster. That’s particularly true when dealing with temporary projects that require infrastructure that companies want to spin up and shut down just as quickly.
IBM gave DataWorks a hyped up “AI” spin, throwing in mentions of “cognitive” and Watson to cover all the bases. What it seemed to boil down to was the use of machine learning in data processing and data discovery. What I wanted to hear more about was automated model assessment and deployment options. That’s the last mile of turning data into decisions embedded within applications, whether those apps are deployed in the cloud or on premises.
Google is clearly trying to broaden BigQuery’s appeal with those “for Enterprise” enhancements. But with so much data still on premises, the public cloud players should do more than offer secure, dedicated connections to support hybrid deployment scenarios. Microsoft and Oracle both make it a priority to integrate with existing, on-premises IT investments such as data warehouses. Amazon and Google could do more to highlight click-ready integrations with the most popular on-premises data platforms. I know Google has third-party options; but Amazon and Google should both be more vocal and visible about supporting common hybrid scenarios without making about moving everything to the cloud.
SAP really had to have its own options for supporting Hadoop and Spark workloads in the cloud, so the Altiscale buy was a smart, turnkey investment. At the same time SAP still has to play nice and keep its options open with Amazon, Google, Microsoft and IBM. SAP is not a hyper-scale cloud player, so it has done well to partner with all of the above. Given the size and stature of SAP’s customers, these cloud partners would all do well to support hybrid scenarios. That will help build trust and remove cultural barriers that might stand in the way of moving on-premises SAP deployments to the cloud.
As Olson observed, only a fraction of IT activity is currently in the cloud, but that’s where the action and the real value is going to be generated. While the back-office, transactional data is crucial, it’s generally predictable and well understood. The cloud is where companies are increasingly intersecting with partners and customers and uncovering interesting insights. Whether it’s through mobile apps, social networks, partner portals or third-party data enrichment, Constellation Research believes that 60% of the data that companies consider to be mission critical will reside outside their four walls by 2020. Making use of that data will be the key to driving breakthrough business models based on data monetization and data-driven services.