Doug Henschen on Analytics, Big Data & Smart Apps
Informatica addresses big-data integration, governance and security through a subscription-based portfolio. But will big data remain a separate world?
One irrefutable trend in 2015 has been growing enterprise adoption of big data platforms including Hadoop and NoSQL databases. Yet as companies move beyond big data pilot projects and try to do more with their deployments, many struggle to achieve repeatability and productivity. Short on experienced talent, companies look for any way possible to avoid one-off-coding and development work.
Enter Informatica, which last month introduced Informatica Big Data Management, a three-part offering aimed at big data integration, governance and security challenges. Big Data Management replaces Informatica’s PowerCenter Big Data Edition with a separate, subscription-based product line aimed exclusively at big data environments.
The first component is Big Data Integration, which runs on Hadoop and promises to save companies time, trouble and, therefore, money over hand-coded data-integration, data-transformation and development work. This has been Informatica’s value proposition in the data warehousing arena for decades. Big Data Integration extends the promise to complex big data environments with intense volumes and varieties of batch and streaming data.
The second two components of Big Data Management are Informatica Big Data Quality and Governance, and Informatica Big Data Security. These may be of less immediate interest to companies that are just starting out with big data experiments. But Informatica argues these components will become increasingly important as the number and diversity of big data sources and projects grows.
The one question I have is whether and how long enterprises will view and treat traditional data warehouse environments and big data projects as separate worlds?
Big Data Integration
Data-integration products are all about connecting to data, so vendors in this space invariably tout their portfolios of pre-built connectors. In Informatica’s case, Big Data Integration offers more than 200 connectors that speed and simplify access to data (as compared with the hand-coding) and ingestion to Hadoop and NoSQL databases. The portfolio includes two-way integrations to modern big-data platforms, real-time sources and cloud-based apps and databases. Big Data Integration also offers 100-plus data-transformation and parsing routines, including options to handle variable and semi-structured data and big-data world formats such as JSON, Avro and Parquet.
Flexibility is another draw for Big Data Integration. Running on Hadoop, it supports not just MapReduce for batch processing and Apache Spark for fast, in-memory batch or streaming-data processing, but also Tez and Informatica’s own high-performance Blaze engine, which offers familiarity to veteran PowerCenter users while taking advantage of distributed processing power and Hadoop’s YARN management layer. With all these options at its disposal, Big Data Integration can intelligently and automatically execute each workload on the best-suited engine, according to Informatica. Here, too, the idea is to speed execution while taking manual work steps out of big data projects.
Adding Data Governance and Security
Where some big-data management offerings begin and end with ETL, Informatica Big Data Management also addresses data governance and security. Governance and data quality are always important, but their importance increases as the uses of big data multiply. That’s when Informatica Big Data Quality and Governance collaborative stewardship capabilities help ensure that all appropriate data stakeholders are involved in setting data definitions and standards. The various constituents get role-specific interfaces, and policy-based workflows, approvals and auditing features ensure compliance.
BigData Governance and Quality is also about helping to find the value in big data. For example, a Live Data Map powered by Spark provides a universal metadata catalog and knowledge graph for enterprise data. This supports searching, matching and linking among transactional data, machine data and social data to illuminate behaviors and better understand customers, prospects and influencers.
Informatica started stepping up its security capabilities earlier this year when it released Secure@Source, which analyzes the metadata in Informatica PowerCenter repositories and spots sensitive data, such as payment card and personally identifiable and personal health information, as well as systems, groups and departments at risk.
Informatica Big Data Security brings Secure@Source capabilities to big data, uncovering sensitive data and spotting the business units and individuals that have access to that data. The software gives data and security professionals insight into how such data is used and whether it’s adequately protected. Visualizations pinpoint sensitive data by geography and function while risk analytics highlight vulnerabilities that demand immediate remediation. One alerts are raised, policy-based protections and dynamic data masking can be used to secure or de-identify sensitive data.
MyPOV on Informatica Big Data Management
Informatica previously offered PowerCenter Big Data Edition, its traditional data-integration suite with added big data management capabilities and a selective ability to run on Hadoop. But that’s being replaced by Big Data Management, which is a separate, subscription-based product that’s a better fit with the times, according to Informatica.
Big data projects are graduating from exploratory experiments to separately budgeted initiatives with executive sponsorship and increasingly mission-critical expectations, according to Informatica. Thus, the time is right for a separate product, Informatica execs reason. The subscription-based approach (with software deployed on-premises but paid for annually) is in keeping with the way Hadoop subscriptions are typically handled. This keeps initial costs down and helps businesses scale up as data volumes grow.
It’s good to see that Informatica has not taken and all-or-nothing approach with Big Data Management, offering an Enterprise edition that skips some of the data-quality, data-profiling and data-masking features included in the Premium Edition. What’s more, the Big Data Security offerings are entirely optional, which is a good thing as security professionals with budget and technology oversight many not yet be familiar with Informatica.
My one concern about Informatica’s separation of its traditional and big data product lines is that we will eventually see the pendulum swing the other way. As big data sources, integration needs and data-quality and data-governance concerns become more commonplace, they will become the prevailing data challenges in the enterprise. Indeed, Informatica depicts a “crossing the chasm” movement from traditional data-warehouse use cases to “next-gen” data big data use cases (see diagram above).
When the transition is complete, big data will just be data. And when data-management professionals stop looking at big data as a separate world, they may well want and expect a single portfolio to address all data management needs, whether large or small. I suspect that day may come sooner than many expect.