IBM Joins Apache Spark Bandwagon (and Coopetition)

IBM stole the day-one headlines at Spark Summit 2015 in San Francisco with a big endorsement of the open-source, big-data-analysis platform. But it’s sure to be a selective embrace, as IBM, like other commercial vendors, plans to offer its own software and services on top of Spark.

IBM threw its significant weight behind Apache Spark on Monday, calling the in-memory platform “potentially the most significant open-source project of the next decade.”

Among the moves announced, IBM will offer Spark as a service on its BlueMix cloud, it will open a Spark development center in San Francisco and it will redirect more than 3,500 IBM researchers and developers to work on Spark-related projects. IBM also promised to educate more than 1 million data scientists and data engineers on Spark through community partnerships and support for online courses.

The big news on day one of Spark Summit was news of IBM's embrace of the open source platform. — The big news on day one of Spark Summit 2015 was IBM’s announcement it will throw its weight behind the open source platform.

All of the above is great news for the Spark community. But is Databricks, the Spark development, certification and support firm, in danger of being eclipsed by big companies embracing the platform? Spark is the darling of the conference circuit this year, with Databricks executives showing up at Informatica World, Alteryx Inspire15 and many other events as keynote speakers. Even when official representatives aren’t there, Spark is often mentioned as a “Spark inside” enabler of new big data initiatives, as was the case at the Teradata Influencers’ Summit.

But the embrace of Spark isn’t always wholehearted. That’s because the platform supports multiple modes of analysis, including machine learning, SQL, R, graph and streaming. Hadoop distributor Cloudera, for example, was early to jump on the Spark bandwagon, but it touts the platform’s machine learning capabilities, not Spark SQL, which presents a threat to Cloudera’s Impala SQL-on-Hadoop component. Hortonworks and MapR also support Spark, but they give equal billing to Hive and Drill, their favored SQL-on-Hadoop options, while invariably showing Apache Storm in architectural diagrams as the streaming option instead of (or in addition to) Spark Streaming.

I’m set to hear more about IBM’s specific Spark plans here in San Francisco this week, but at last week’s Hadoop Summit in San Jose, a few IBMers informally told me the company is mostly interested in using the Spark in-memory platform and machine learning options. As for Spark SQL and Spark Streaming? These are two areas where IBM can offer its own technologies. What’s more, IBM is contributing its own SystemML machine learning software to the Spark community, building influence in this core area.

With a Spark service now available on BlueMix and thousands of IBMers now working Spark-based applications, Databricks will see new competition to its eponymous Databricks platform (formerly called Databricks Cloud), which runs on Amazon Web Services. IBM’s move is also a challenge to analytics leader SAS, which has spent the last three years developing SAS Visual Analytics and Visual Statistics as it’s choice for in-memory big-data analysis (either on top of Hadoop or on a dedicated distributed cluster).

Even if commercial plans lie behind IBM’s embrace of Spark, Databricks executives weren’t about to throw cold water on any endorsements of the platform. “It’s great to see some of the large vendors in the community throwing their weight behind Spark,” Databricks executive Arsalan Tavakoli-Shiraji told me last week. “SAP is integrating Hana with Spark, IBM is embracing it, and Intel is also making a lot of contributions, so it’s great to see the community growing.”

Stay tuned for more from me this week from IBM, SAS and the Spark Summit as the fast-moving big-data analysis world moves even faster.

1 Comment

Doug Henschen says:

June 15, 2015 at 1:06 pm

I asked my SAS contacts if the company has an official position on Spark, and this was the reply from product manager Mike Ames: “While Spark is currently an immature technology, it shows promise with rapid adoption as a result of its data processing capabilities. SAS and Spark are very capable of coexisting, with products such as SAS Data Loader for Hadoop, which can push transform logic to Spark.”

LikeLike

IBM Joins Apache Spark Bandwagon (and Coopetition)

Published by Doug Henschen

1 Comment

Leave a Reply Cancel reply

Share this:

Like this:

Published by Doug Henschen

1 Comment

Leave a Reply Cancel reply