Comments on: Self-Service Data Prep Options Proliferate http://doughenschen.com/2015/07/29/self-service-data-prep-options-proliferate/ Doug Henschen on Analytics, Big Data & Smart Apps Mon, 29 Feb 2016 13:30:13 +0000 hourly 1 http://wordpress.com/ By: Jean-Michel Franco http://doughenschen.com/2015/07/29/self-service-data-prep-options-proliferate/comment-page-1/#comment-315 Sun, 02 Aug 2015 11:02:36 +0000 http://doughenschen.com/?p=227#comment-315 This is very inspiring and very useful now that data prep is becoming a hot topic and tools are proliferating, Doug. Thanks for that.I especially like your insights on the last part.
As you mention, it happens that more and more people across organizations are spending too much time to find and adjust the available data sets to address their information needs.
It may be users of BI or data discovery tools. With this respect, it is interesting to note that the name of the mentioned product from Qlik is not data prep, but rather smart data load. And I feel after looking at the demo that the purpose seems mostly to inject data into Qlik associative engine, a capability that is very specific to Qlik in memory engine. Although this is very useful in the context of Qlik, I’m unsure it aims to compete with “traditional” data profiling and transformation functions found in other tools.
It may be users of advanced analytics, like data scientists, and this explains why advanced analytics providers like Alteryx or Datameer are introducing data prep features inside there toolsets.
It may be users of integration tools, like in your Snaplogic example. At Talend, we have introduced self-service capabilities in our iPaaS platform, because cloud based application tend to make it easier for non experts to collaborate on design and development tasks that used to be pure IT tasks.

But, overall, I think data prep should be considered as more than an embedded feature needed in data centric products and platforms, but rather a service that an organizations has to set-up and provide to the business users. Data is everywhere and more and more people need it for their daily work. While some of them may have tools like the one mentioned below, many others use personal tools especially Excel. More and more people are spending a lot of their time with this. And because it is not under control, we hear more and more horror stories related to leaks: for example, Wikileaks has a Sony section that includes a search engine on very sensitive leaked data from Excel files.

So, we at Talend feel that Data prep should go beyond providing a productivity tool for individual users. Data is a shared asset, and the good news is that tools now allow us to consume it as self service. But, in most organizations where information maturity is still at its early phase, achieving it should be a collaborative effort, where data experts have responsibility to organize information reuse and guide the business users on they road to autonomy. Providing successfully data as a self service needs organization empowered by collaborative tools that can share reusable data catalogs and data preparation tasks across the aforementioned use case.

Like

]]>