Which came first? The Chicken (Data Catalog) or the Egg (Metadata).

by | Feb 19, 2020 | Data Governance, Impact Analysis

In a recent webinar titled, “Maximize Your Data Catalog Investment,” attendees submitted questions, one of which got me thinking about the old idiom – The Chicken or the Egg? The question was centered around should their fortune 500 company harvest metadata first, or focus rather on building a data catalog. In a way it’s an oxymoron type of question, in that a data catalog is comprised of harvested metadata, so a large organization really can’t have one without the other. Or can they?

The more fundamental question is this:

Are you harvesting and extracting the full meaning of the metadata so that you can enrich the data catalog with granular metadata details?

This is often the missing link in harvesting metadata. Metadata is, in a nutshell, messy to harvest. Why is that?

Metadata is more than just reading schema, table, column information. That’s what some might consider the easy part. The challenging part of harvesting metadata beyond the simple schema of table, column, the organization needs what is referred to as a lineage extractor.

Enriching Data Catalogs with Granular Details Answers Key Data Governance Questions

The magic of a lineage extractor is to decode two key pieces that are necessary to enrich your data catalog with the nitty gritty granular details that can make the difference in quickly answering your key data governance questions.

 

  1. Source to destination lineage
  2. Data transformations

Take for example a simple question such as “Tell me on that BI report, what is not only the source of the data, but how many hops and what transformation took place in between them?” This is the tricky part of metadata in that to compose a granular data lineage flow complete with transformation, you must utilize a lineage extractor to decode the various scripts and stored procedures embedded within the metadata.

This leads us into a discussion of, is there such a thing as metadata that is particularly harder and thus more complicated to harvest? The short answer is, it depends on the source system. We have solved the mystery of extracting metadata using a lineage harvester methodology, which is much more powerful and leverages machine learning, than a traditional metadata scanner. Many data catalogs are first created with the minimum viable product (MVP) approach to scan the metadata and gather it into a data catalog. After a period, it becomes apparent that the data catalog is missing some of that hard to decipher metadata enhanced with details on the various data transformations to paint a complete picture of the metadata. And in those instances where the source system is complicated to harvest metadata due to customizations and proprietary formats, a custom lineage extractor is often necessary to accurately extract the granular metadata.

Identify Your Business Problems

So, it really doesn’t matter which comes first – The chicken (data catalog) or the egg (metadata). Albeit we can debate which is the better starting point, but the answer on where you should start first often depends on what business and technical problems you started out to solve.

What invariably ends up is no matter where you start, both are interdependent on the other to perpetuate the long-term survival of the project.

What we’ve found from our experience with clients, is there is always a breaking point when it becomes imperative to unlock the hidden meaning in the metadata to enrich the data catalog so that it can answer 99% of the data governance questions being asked.

What we’ve found from our experience with clients, is there is always a breaking point when it becomes imperative to unlock the hidden meaning in the metadata to enrich the data catalog so that it can answer 99% of the data governance questions being asked.

What kills a data governance initiative long-term is when there is a lack of trust, that is often based on missing elements such as granular lineage and transformation details, people simply throw up their hands and say, “I don’t know the answer, but we can research it.” No one minds that answer if “research it” means I’ll get the answer promptly, but 9 out of 10 times if you have gaps in your data catalog based on missing metadata, details the answer is going to not be prompt – It often ends up taking days or weeks and a lot of manual manpower.

Impact Analysis – Your Most Powerful Tool

This leads us to one of the most powerful tools at your disposal once you have the complete metadata picture that traverses the source to destination with the transformations in between. This is the Impact Analysis which is a map of the interconnected metadata relationships between everything in your data catalog. The question people often ask is, “How can we be more effective at Impact Analysis?” The answer, as you might guess, is rather simple. The more complete your metadata harvesting, leveraging a lineage extractor methodology, the more effective your impact analysis will be. For when you’re missing any piece of your metadata puzzle, you will not be able to quickly answer questions, such as – If I change this data source, how many processes and business reports will be impacted? If there are gaps in your metadata, you’ll not know what part of the puzzle is missing until after you initiate a change and later find out you unexpectedly broke something else.

 

Don’t forget to extract lineage from your ETL or ELT Framework

Gaining a clearer understanding of what you typically analyze is the name of game. There is no doubt that by using a lineage extractor instead of a simple scan of your database, you can automate your processes and gain efficiency, speed, and a clearer understanding of your data.  But don’t forget that you will also need to conduct lineage extraction on your data movement technology– for most organizations, that is your ETL (or ELT) data processing framework. ETL is the most commonly used method in any enterprise when it comes to data integration. Forgetting to extract lineage here means missing the point where most of the data transformation (or magic!) happens.

 

Reciprocal Relationship – You can’t have one without the other!

Now you see why the egg (metadata) is the source which creates the egg (data catalog). Sure, we can debate which came first, but we will all agree that you can’t have one without the other.

Interested in learning more about the power of lineage extractors, impact analysis and more? Explore a unique Unified Enterprise Data Governance framework that provides the granular metadata to enrich your data governance catalog.

 

Location

One Lincoln Center
18 West 140th W Butterfield Road;
15th Floor
Oakbrook Terrace, IL 60181

708-524-9500

sales@compactbi.com

Share This