Data Governance Success is Contingent on Realizing the Vision that Data is an Asset.
Data Governance Liabilities are Costing You Money
The fact is that three data liabilities are weighing us down.
Data Governance Yields a Competitive Advantage
What’s missing in data governance is a unified data governance framework that solves the most perplexing issues with governing data. But first we must keep in mind three macro issues that are driving the need to uncover and fix the three key culprits preventing governance success.
First off, the demand for data insights is growing exponentially. Take for example the BI market, which according to Gartner, is forecast to grow to $22.8B by 2020, along with this startling statistic from IDC which predicts that there will be a 10-fold rise in worldwide data by 2025. However; more data and more demand for insights means building a unified enterprise data governance framework that puts all the important data repositories and applications at your fingertips. According to Forbes 80% of analytical projects are spent on data preparation, leaving less time for analysis resulting in fewer completed analytical projects. Even a modest reduction in data preparation time will result in a faster time to market, saving your business money and giving you competitive advantage. The key is to build a unified data governance framework that allows you to find all your data and application assets and their relationships in one easy interface.
Cracks in the Data Governance Framework
As most devops professionals know from their personal experience, and Computer Weekly also points out that three common issues arise that are cracks in the data governance framework – (1) Data lineage, (2) Incomplete data, and (3) Redundancy of data. Imagine a data governance console, where you can see at once all the data sources and targets and all the applications that do anything with this data, including reports and analytics? Imagine that this console also shows you the data transformation rules, data quality rules, and operational metadata about the applications! Imagine how much time it will save in data preparation for a new analytics project! How about identifying all the data objects that store any personally identifiable information (PII) about your customers? The reason most enterprises do not have such a data governance console is for two simple reasons – It’s complex and costly to implement without automation.
However; before we dive into illuminating the missing elements from the data governance framework, let’s get a macro picture understanding of the metadata and data quality market.
Metadata and DQ Trends
Internally many organizations developing a business case need rationale to persuade management that other organizations are experiencing similar problems. So, let’s put a spotlight on the analyst trends for metadata and data quality to illustrate that both metadata management and data quality are forecast to experience double digit growth over the next few years.
- Metadata Trends: Growth Rate of 24.1% – According to Gartner’s 2018 “The State of Metadata Management” they indicate that by 2021 “Organizations will spend twice as much effort in management metadata as compared to 2018.”In spending dollars, the numbers are even more compelling according to the enterprise metadata management market report by MarketsandMarkets™ which provides quantified B2B research. They indicate the metadata market will grow at a compounded annual growth rate of 24.1% by 2022 – From USD $2.67B to $7.85B. The drivers behind this growth include increasing data volumes, operational efficiency, data quality management and the all too common fear driver – Regulations.
- Data Quality Trends: 7% Growth Rate – Gartner points out two fundamental shifts in data quality. First in Gartner’s key trends of modern data quality tools, they point out that DQ tools are undergoing a strategic shift based on governance, data diversity, analytics and a changing audience. The audience is particular to note as the discipline of governance is maturing to expand to a business audience that is ever dependent of both governance and quality for BI. Secondly, Gartner hits the nail on the head noting that poor data quality impacts an organization’s pocket book – $15M average cost annually. As for investment in data quality it’s growing at a compound annual growth rate of 17.7% through 2020 according to MarketsandMarkets™ research report on the data quality tools market.
The $64,000 question is which investments in metadata and data quality are the wisest investments for your organization to reap the greatest dividends and what’s the cost of the status quo.
Why do we have a love/hate relationship with Metadata and Data Quality?
The fact is everyone loves metadata and data quality because their everyday lives depend on it. Well, we might not love it, but we do hate it when it’s messed up. Imagine if you bet your career on a new data insight and presented it to the board only to have it discredited and picked apart when you couldn’t sufficiently answer simple questions such as:
- What is the quality of the data?
- Did you consider how the source data was calculated/transformed?
- Did we mask all of the Personally Identifiable Information (PII)?
When you don’t know answers and respond with “I’ll need to research it,” you have a missed opportunity making your case to the board. Not being able to answer data related questions is a tell-tale sign that your metadata, PII detection and DQ tools aren’t providing business users with data transparency. It’s transparency that is a crucial element of any data governance framework.
How Can We Cut our Data Liability with a Unified Data Governance Framework?
The reason organizations haven’t solved these sticky issues (data lineage, data quality, and PII) completely is because they are complex and hard to solve. The drivers to solve them are also three-fold:
- Without automated data lineage, devops spend too much time in data identification and preparation.
- Poor data quality reduces confidence and increases re-work and time to market costing money and lost opportunities.
- Not being able to detect all PII data results in penalties for compliance violations and damage to organizational reputation in the market.
The question is whether or not there is a better way to solve them?
- The Metadata Dilemma: It’s Complex because it’s Complex – The missing DNA in a data governance framework is automating metadata extraction. The huge challenge is that extracting metadata from complex SQL scripts, stored procedures, and other legacy languages such as COBOL or SAS aren’t easy to automate. Nevertheless, without the capability to scan the metadata and extract an accurate data lineage with the transformation in between source to destination you end up with gaps in your data lineage. Without a complete end to end data lineage, you will spend time filling the gaps manually costing you time and money.
- Next Gen Metadata: Lineage Extractors – First off, what is a lineage extractor? Simply put it’s a methodology based on a technology that scrounges through all those complex SQL scripts and stored procedures to complete the technical data lineage picture. At the heart of a lineage extractor is an automated process that not only scans the metadata for source to destination connections, but also decodes the data transformations in between. This is the missing DNA that has been so complex to solve. The lineage extractor goes well beyond scanning metadata for table.column definitions and decodes the data transformations (typically implemented in either ETL or ELT applications) between the source and destination. This approach solves what BI data users have been searching for – understanding the devils in the details of lineage by exposing the data transformations in between the source and destination.
- Next Gen Data Lineage: Get down to the Record Level – Why do you need to get down to the granular record level details? That’s simple – to solve regulatory issues such as GDPR, record level data lineage identifies where Personally Identifiable Data (PII) exists and how it moves throughout internal and external systems. The key is to leverage metadata analysis down to the record level by automating your PII data tracking processes in order to reveal all of your PII data.
- The Data Quality Dilemma: The Root Problem is Always Upstream – Every enterprise has a DQ strategy with various tools implementing rules to ferret out poor quality data. But what is being done upstream at the application? Typically, IT performs manual testing by creating test cases, test data, specifying expected results, running test scripts, reporting test statuses and managing all of the testing artifacts over time. This approach is laborious for IT and prone to error. Any errors that propagate downstream impact BI analysis and when (it’s only a matter of time) business users surface data quality issues, it invariably results in rework by IT to find and fix the root cause in addition to delaying the BI analysis. The question is there a better way? What’s needed is automated testing platform that makes it very easy to test any data applications (ETL/ELT), automatically, repeatedly, on demand, and maintain a history of all the test runs. This becomes even more important if you use Agile development methodology where you have even less time for testing of applications. Automated testing of data intensive applications has to be the backbone of your data quality strategy. And this should be integrated into your data governance framework.
- Next Gen Data Quality Application Testing: Addressing the Upstream DQ Problem Head On – Few enterprises have augmented their DQ strategy with data application testing yet. You might wonder what is data application testing? It’s simply a methodology to automate the typical manual methods used for application testing. One of the biggest issues is this is often a once and down or spot check strategy leaving the organization vulnerable to risk. Data application testing encompasses all of the manual work to improve operational efficiencies by supporting all types of testing – Unit, Systems and Integration, Regression, and Performance testing. The end goal is to reduce time, cost and errors in your ETL applications.
The Side Benefits of this Next Gen Approach
It’s certain that data volumes will continue to increase and stress existing approaches to the breaking point. Solving the complex metadata problem will provide a 360 view of your data through a visual impact analysis that is easy to understand for the layman business user. And solving the data quality conundrum at the root, by conducting automated data application testing, will prevent a cascade of DQ issues downstream. As a side benefit any regulatory constraints demanding data transparency (e.g. GDPR, IFRS 15, BCBS 239, etc.) will be easy to prove through detailed evidence of your DQ rules, PII detection process and granular metadata lineage to quickly get the regulator on their way to another organization that is still stymied up with solving these complex problems. And let’s not forget this makes life easier for the IT professional – Automation improves operational efficiency.
Breaking the Status Quo
We have three choices – decide to take action, make a decision to not to take action or perhaps the worst option which is to not make a decision at all. One thing is certain by looking at global macro trends in any industry (which also impacts data management) and it’s happening in our everyday lives – Automation is the Future. The question that we must ask – “Is it time to break the status quo and investigate the value of a unified data governance framework?”
A Unified Data Governance Framework
The Compact Unified Enterprise Data Governance™ Framework takes a unique approach by leveraging market leading data governance catalogs (Collibra, IBM, Informatica, etc.), and turning them into Unified Enterprise Data Governance Consoles, by delivering enriched data lineage, data quality, and PII metadata automatically. With enriched data, the business user can quickly ascertain detailed data lineage with transformations, quickly reveal PII data, and achieve higher data quality scores by fixing data application issues at the source.
Interesting in learning more?
Explore how next gen lineage extractors, impact analysis, data tracking and data application testing automation can be used in your enterprise to solve these vexing data governance problems that have been hard to solve.
Where do answers lie in your business?
One Lincoln Center
18 West 140th W Butterfield Road;
Oakbrook Terrace, IL 60181