Metadata – It’s More Than Data about Data. The Power is in Understanding the Connections Between Metadata.
The Metadata Addiction
Metadata is hidden in our everyday lives almost everywhere. In fact, we are just about dependent on it. But How?
Imagine jumping in your car, plugging in your smartphone and cueing your favorite music playlist.
What automatically appears? – Album, Artist, Song Duration, Playlist. And more. This is metadata about your musical universe – telling you something about the music you own. This allows you to do things you could not do.
Your favorite GPS application is another example. The GPS coordinates are one of the many pieces of Metadata. If you use the Waze traffic app, metadata is correlated with traffic.
No matter what you do today or tomorrow, one thing is metadata. We are all addicted and we did not even know it.
The Metadata Organizational Challenge
What if we had a tool that could automatically scan for all of the metadata contained within our array of Enterprise and Cloud applications so that we can create a giant repository of all our metadata. Would this be nirvana? No, let me tell you why.
Metadata is meaningless as a data point. Take for example two GPS coordinates- one as a source and second as a destination. Going back to our map analogy, we need more than just two data points to compute our route. We need metadata containing all the information about all of the roads in between, and if we take it a step further, we want to know about construction delays and accidents to ascertain the fast route between point A and point B.
The point is it’s all about metadata relationships. When we can relate metadata, we unleash the power of leveraging metadata to solve problems.
In the case of the mapping example we solve the problem of getting from point A to B the fastest with one important side benefit – we don’t get lost.
Creating Relationships Between Data Sources and Destinations in the Enterprise
Metadata in the enterprise isn’t much different. But there’s a challenge. Scanning for metadata typically involves schema.table.column which only provides one data point. Take it a step further and we can create a relationship between data source and destination, but it’s still missing an important piece. You’re probably wondering about now, what’s missing? In any enterprise there are data transformations (typically implemented in either ETL or ELT applications) between the source and destination. Data lineage is incomplete and limited in it’s use without the data transformations. Both business and IT users want to know what data transformations took place between source and destination to clearly understand the data flow from point A (source) to point B (transformation) to point C (destination). If we had all three then we’d truly unleash the power of metadata to solve common questions that IT and the business are asking. For example, IT benefits from improved dataops efficiency. Business benefits from better regulatory compliance. These are just two examples amongst a host of others.
The Power of Metadata Relationships
Let’s use a simple analogy. Think of metadata as a marble. Let’s say you have 10,000 marbles spread all over your living room floor. How does this relate to the enterprise organization? You have an array of disparate systems housing all of your metadata marbles and you yearn to organize them to yield control over your metadata in a data governance context. As we’ve described earlier what’s required is to unlock the power of the metadata relationships.
The challenge is to organize the relationships between the metadata marbles so that we can visualize how the data flows from source to destination with the transformations in between. Perhaps such an organization will reveal other interesting facts about your systems. How about finding systems that no one or very few users actually use? How about finding multiple systems that use the same data sources and apply very similar transformations for different sets of users? Maybe you could rationalize some of these systems saving valuable resources and gaining operational efficiencies?
Sounds easy in theory but what’s required is a new approach – a metadata lineage extractor to do the hard work for us.
The Traditional Approach
First let’s describe the status quo. A connector or bridge is the traditional approach reading from exported metadata, system tables or a defined API to move selected metadata from one source to a metadata repository. Think of this as the Gen 1 approach to metadata. This traditional connector or bridge is limited by what the API allows you to do. For example, the API may not allow you to access all the metadata you want, for example, the transformations taking place in the ETL/ELT source code. The question is, is there a better way?
The Lineage Extractor to the Rescue
What if we could do more than just connect or bridge metadata from available system tables, metadata interchange formats or other metadata repositories? What if we could extract the data relationships and transformations directly from the source code, while enriching that information with metadata acquired from system catalog tables, business glossaries, master data management systems, reference data tables, operational metadata, or other metadata repositories along with user supplied metadata? If we could do this it would be a game changer.
This would be a fundamental shift in how metadata is acquired, viewed, and consumed. It would look like this. First, we’d start by analyzing the source code, followed by enriching that information with the metadata available from the standard metadata sources. Then we’d extract all of that into a rich meta-model that describes not only the data sets, but the flow of the data between those sets. Furthermore, we load it into the target information catalog in the format that it requires. Presto! We just solved one of the most perplexing issues to bring meaning to your metadata by connecting the dots from point A (source) to point B (transformation) to point C (destination).
So, the next time you hear people describe bridges or connectors, two common methods to extract metadata, think Lineage Extractor that extracts the data flow and transformations directly from the source code.
The Added Bonus of Impact Analysis
By utilizing Lineage Extractors to solve the metadata challenge facing every enterprise, you’ll also get a side benefit – Visual Impact Analysis. You’ll now be able to quickly comprehend visually the impact of any change to a data element or application on downstream data and applications. In addition, you’ll know which tables, ETL jobs or BI reports are affected by a single change to a data element or a transformation rule. Imagine the impact on devops efficiencies this would create! Imagine the quality improvements as you would instantly know which downstream objects or applications need to be regression tested, as a result of any change anywhere in the enterprise data landscape.
As we started at the beginning of this blog – Metadata is meaningless as a data point. Now you understand how, using a Lineage Extractor approach over the traditional bridge or connector can be a game changer in how you capitalize on the power of your metadata.
Learn More About Compact’s Unified Data Governance Framework™
One Lincoln Center
18 West 140th W Butterfield Road;
Oakbrook Terrace, IL 60181