Metadata is Like Body Language
What’s hiding in each of these pictures?
Hint: It’s not the words they are not saying, but the hidden metadata.
The Nonverbal Dilemma
Nonverbal communication is composed of body gestures and vocal inflections. The words you speak area small fraction of communication. In a 1971 book titled, “Silent Messages,” by Albert Mehrabian, the combination of non-verbal and spoken words is referred to as the 7%-38%-55% rule.(source)
Think of it this way. When you put together that fancy new PowerPoint presentation for a meeting, you probably focus the majority of your effort on the words and storyline, but in the end your audience interprets your presentation like this– They base 7% of your likeabilityon your word choices, and a whopping 93%, on what we could call the hidden metadata.
The 93% is split between 55% being the visual quality of presentation and 38% is your vocal tone and inflections.
Just think back to a recent business meeting experience you’ve had and I’m sure you were intuitively reading between the lines to understand what was being said verbally and what was being communicated otherwise.
This same problem happens in enterprises with their metadata. Let’s explore how.
The Metadata Conundrum
What’s happening with the metadata in most enterprises? First off, everyone believes it must be governed and often IT goes to great lengths to document it, but what often occurs is a tremendous amount of manual and/or semi-automated effort to create a metadata repository. These projects are done in earnest, age quickly and go out-of-date the day they are completed. The result is declining business and IT value as the accuracy of the metadata goes stale. What’s needed is an automated methodology to continuously refresh the metadata repository to address the root cause of problem so that the metadata repository is up-to-date.
But is that the only root cause of the problem with metadata? Let’s dig deeper into the two common elements of metadata and the third and fourth hidden elements that are like hidden body language – What metadata reveals is interesting, what it hides is vital.
What are the four elements to understanding Metadata?
The first two elements are what most organizations are focused on solving and these are easy in principle but complex in practice to automate. The third and fourth tier is the missing link to truly understand your metadata and one of the most important tools, just like body language, to be able to solve business problems. The challenge with the third and fourth elements is that they are complex to solve without automation, hence they are often the missing elements of an organization’s metadata repository.
1) Metadata: Location, Location, Location
This is the first element of metadata. Business and IT users want to know the location of their data. Take for example a business analyst or data scientist that is spending 60+% of their time wrangling their data. According to Forbes the breakdown of time looks like this “Data scientists spend 60% of their time on cleaning and organizing data. Collecting data sets comes second at 19% of their time, meaning data scientists spend around 80% of their time on preparing and managing data for analysis.”(source) If 19% is spent on collecting data sets, any reduction in time spent finding the right data set means more time for analyzing the data faster. Take for example a business meeting reviewing a new analytical model where someone asks a simple question – Where did you get this data set? Besides knowing the location of the data set, what people are also wondering is did you choose the right data set for the analysis. Location of data is one element of metadata but it is only a small part of the picture. Using the 7%-38%-55% analogy, location of data is only one part of the 7%. Let’s explore the second part.
2) Metadata: Schema, Table, Column
Any business or IT professional desires to drill down into the location to extract details on the schema, table and column attributes for a deeper understanding of the data. This is considered an obvious part of any metadata repository, but the challenge remains in how to automate this process. While efforts to manually or apply a semi-automated procedure can enrich this data in the metadata repository, the lingering problem is these efforts lead to aging of schema, table and column information as soon as it is published. The combination of location and scheme, table, column is easy in principle but often hard to maintain just like the next element of metadata.
Now that we have revealed the location and schema, table, column metadata information, let’s get down to the 93% of information that is hiding in your metadata and drives the most value.
3) Metadata: Source to Target Relationships
This is the first step to conceptually understanding how your metadata flows throughout your organization. The first two elements collected the metadata, but without understanding how it traverses your organizations’ systems you are in the dark to understanding how it flows from system to system. In years past you could create manual or semi-automation documentation of source to target lineage but this is also difficult to maintain. The moment a system changes due to an application upgrade or core system replacement the lineage is in turn broken. What’s needed is a fluid way to automate the process of extracting the source to target metadata relationships in order to create self-documenting lineage. You can think of this as: What metadata reveals is interesting (source to target lineage), what it hides is vital. Let’s explore the most important hidden element of metadata that is often absent in a metadata repository.
4) Metadata: Transformations
Let’s return to our business meeting where someone asked about: Where did you get this data set? A subsequent question is often regarding: How was the data transformed? Take for example an ETL process that extracted, transformed and loaded the data into the target system where the data set being used is located. The question of how the data was transformed is often illusive and requires one to find an expert IT resource to explain the transformation since it is not documented in the metadata repository. The transformations, like body language, are the most important piece to completing the metadata picture. The challenge is exposing the transformations between source and target systems require reading stored procedures and code used in the transformation process. The reason that this is rarely documented is that any manual or semi-automated way to document it is extremely laborious and prone to error. What’s required is automation to read the code to quickly and accurately document the transformations.
Solving the Metadata Conundrum
So far we’ve painted the picture that: metadata is like body language. What it reveals is interesting (source to target lineage). What it hides is vital (transformations). In order to solve the four elements of metadata: location, scheme/table/column, source to target relationships, and transformation in between we must refrain from relying on previously used manual and semi-automated metadata extraction that isn’t sustainable and seek a new approach. Einstein said it best when he said: “We cannot solve our problems with the same thinking we used when we created them”. What’s needed is a new approach to holistically extract all of the information hiding in your metadata beyond documenting location and schema/table/column to reveal the true power to understand your metadata: Visualizing source to target lineage with transformations in between. To further extend the power of metadata, we have partnered with leading data catalog providers, such as Collibra, Informatica and IBM, to enrich the data governance data catalog and business glossary with granular metadata in order to create a Unified Data Governance Console.
Learn more about Compact’s Unified Data Governance Framework™
One Lincoln Center
18 West 140th W Butterfield Road;
Oakbrook Terrace, IL 60181