DATA TO DECISIONS

Doug Henschen on Analytics, Big Data & Smart Apps

IBM Joins Apache Spark Bandwagon (and Coopetition)

IBM stole the day-one headlines at Spark Summit 2015 in San Francisco with a big endorsement of the open-source, big-data-analysis platform. But it’s sure to be a selective embrace, as IBM, like other commercial vendors, plans to offer its own software and services on top of Spark.

IBM threw its significant weight behind Apache Spark on Monday, calling the in-memory platform “potentially the most significant open-source project of the next decade.”

Among the moves announced, IBM will offer Spark as a service on its BlueMix cloud, it will open a Spark development center in San Francisco and it will redirect more than 3,500 IBM researchers and developers to work on Spark-related projects. IBM also promised to educate more than 1 million data scientists and data engineers on Spark through community partnerships and support for online courses.

The big news on day one of Spark Summit was news of IBM's embrace of the open source platform.

The big news on day one of Spark Summit 2015 was IBM’s announcement it will throw its weight behind the open source platform.

All of the above is great news for the Spark community. But is Databricks, the Spark development, certification and support firm, in danger of being eclipsed by big companies embracing the platform? Spark is the darling of the conference circuit this year, with Databricks executives showing up at Informatica World, Alteryx Inspire15 and many other events as keynote speakers. Even when official representatives aren’t there, Spark is often mentioned as a “Spark inside” enabler of new big data initiatives, as was the case at the Teradata Influencers’ Summit.

But the embrace of Spark isn’t always wholehearted. That’s because the platform supports multiple modes of analysis, including machine learning, SQL, R, graph and streaming. Hadoop distributor Cloudera, for example, was early to jump on the Spark bandwagon, but it touts the platform’s machine learning capabilities, not Spark SQL, which presents a threat to Cloudera’s Impala SQL-on-Hadoop component. Hortonworks and MapR also support Spark, but they give equal billing to Hive and Drill, their favored SQL-on-Hadoop options, while invariably showing Apache Storm in architectural diagrams as the streaming option instead of (or in addition to) Spark Streaming.

I’m set to hear more about IBM’s specific Spark plans here in San Francisco this week, but at last week’s Hadoop Summit in San Jose, a few IBMers informally told me the company is mostly interested in using the Spark in-memory platform and machine learning options. As for Spark SQL and Spark Streaming? These are two areas where IBM can offer its own technologies. What’s more, IBM is contributing its own SystemML machine learning software to the Spark community, building influence in this core area.

With a Spark service now available on BlueMix and thousands of IBMers now working Spark-based applications, Databricks will see new competition to its eponymous Databricks platform (formerly called Databricks Cloud), which runs on Amazon Web Services. IBM’s move is also a challenge to analytics leader SAS, which has spent the last three years developing SAS Visual Analytics and Visual Statistics as it’s choice for in-memory big-data analysis (either on top of Hadoop or on a dedicated distributed cluster).

Even if commercial plans lie behind IBM’s embrace of Spark, Databricks executives weren’t about to throw cold water on any endorsements of the platform. “It’s great to see some of the large vendors in the community throwing their weight behind Spark,” Databricks executive Arsalan Tavakoli-Shiraji told me last week. “SAP is integrating Hana with Spark, IBM is embracing it, and Intel is also making a lot of contributions, so it’s great to see the community growing.”

Stay tuned for more from me this week from IBM, SAS and the Spark Summit as the fast-moving big-data analysis world moves even faster.

One comment on “IBM Joins Apache Spark Bandwagon (and Coopetition)

  1. Doug Henschen
    June 15, 2015

    I asked my SAS contacts if the company has an official position on Spark, and this was the reply from product manager Mike Ames: “While Spark is currently an immature technology, it shows promise with rapid adoption as a result of its data processing capabilities. SAS and Spark are very capable of coexisting, with products such as SAS Data Loader for Hadoop, which can push transform logic to Spark.”

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Information

This entry was posted on June 15, 2015 by in analytics, big data, machine learning, streaming and tagged , , , , , .
%d bloggers like this: