The Enhanced Metadata Story – Machine Learning Powered MetaData

by | Aug 15, 2019 | Data Governance, Data Lineage

Data indicates the facts, but metadata tells the story. It’s more than the traditional storyline that metadata communicates in extracting table, schema, and column information. But as machine learning enhances the metadata storyline, applying it’s algorithms to detect patterns in the metadata that were previously hidden in code– we can now find the patterns that traditional lineage tools didn’t expose. The problem is that humans can’t analyze tens of thousands of objects hidden in the complex data lineage. To accomplish this, we need to combine two disciplines – Metadata Lineage Extractors with the power of machine learning to reveal smart lineage visualization, perform deep code analytics and compare and contrast data lineage as it changes over time.

The opportunity is to crack the code of data lineage just like legendary codebreakers did during World War II to break the enigma codes to decipher encrypted communications to achieve a competitive advantage. Leveraging the power of AI reduces IT spend, in a world where we can’t throw unlimited resources at the problem.

What if we could analyze tens of thousands of data lineage elements with the power of machine learning to up our game and expose what was once previously impossible to do. This is no longer theoretical but possible to untangle the spaghetti nest of data lineage to extract and understand complex metadata.

Harnessing Machine Learning

Machine learning makes life easier by unleashing new insights to better understand the organization’s complex metadata. [RC3]

What if you could have more than simple source-to-destination data lineage flows? Instead have access to smart lineage visualizations? In real life, visualization at this level of data governance is crucial to achieve regulatory compliance. However, we don’t just want another silo of information. What we must have is a Unified Data Governance Framework to connect this new rich metadata enhanced with granular data lineage to feed into a unified data governance console.

Connecting Lineage with Your Data Catalog

The purpose of pairing data lineage with a data catalog is to connect all the information required to achieve regulatory compliance, dev ops efficiency, data privacy and security. Imagine if your data catalog solution, such as Collibra, Informatica, or IBM, could be enriched with smart lineage visualizations with deep code analytics and lineage versioning.

Let’s dive into three must have capabilities that are now possible with the power of machine learning data lineage.

1. Smart Lineage Visualizations

Simplifying very large data lineage graphs is only possible by applying machine learning capabilities that can tackle the extreme data lineage use cases to analyze ten of thousands of objects. Smart lineage visualization goes beyond many of the capabilities of current data catalogs in the market and takes their capabilities a step further by doing what required humans to do manually – analyze the metadata and present it using Smart Lineage Visualizations. With this capability we can now go beyond a data lineage flowchart to an enriched data lineage visualization to make sense of extremely complex and large data lineage. The power is when this smart lineage visualization enriches your data catalog, such as Collibra, Informatica, or IBM, so that you can truly understand your data lineage.

2. Deep Code Analytics

Diving into the code is laborious and requires subject matter experts to decipher what someone built to understand the intricate transformations in your ETL/ELT code. Unleashing machine learning algorithms to perform deep code analytics speeds up understanding of where duplicate or redundant code is hidden within your data management application landscape. It also simplifies migration initiatives by rationalizing the ETL/ELT code so that you can speed up migrating from one technology to another. The key power of deep code analytics is to automatically detect duplicate or similar code among thousands of scripts and processes within your enterprise applications.

3. Lineage Versioning

It sounds like a straightforward question when anyone asks – “What’s changed in the data lineage?”. The answer often requires asking an expert subject matter expert who is intimately familiar with the data. The challenge is reliance on SMEs in the ever evolving and fast-changing data ecosystem is neither scalable nor fast. This can now be left up to data lineage automation which quickly analyzes the lineage versioning so that you can quickly answer that simple question – “What’s changed in the data lineage?”

The Future: AI and Data Lineage

So, is AI and data lineage a match? Yes, it completes the data lineage picture to augment what once required humans do manually in analyzing the data to achieve an understanding of the complex data lineage flows. With smart lineage visualization, deep code analytics and lineage versioning an organization can quickly ascertain the hidden meanings in their metadata. [RC4]

Curious to see smart lineage visualization, deep code analytics and lineage versioning in action?

Explore MetaDex™ by Compact Solutions and experience a demo of this new combination of AI and Lineage Extraction to enrich your data catalog.


One Lincoln Center
18 West 140th W Butterfield Road;
15th Floor
Oakbrook Terrace, IL 60181


Share This