data discovery

Making Your Data Discoverable: How to Start Deriving Value From Your Data

Revelate

Table Of Contents

The first step in deriving value from data is making it discoverable. It would be nice if enabling data discovery was as simple as saying, “Here are all the schema diagrams for our databases. Let us know what you need!” But in the real world, data discovery is much more than data availability. Data catalogs exist to help solve this problem, but even the most sophisticated catalog tools can’t magically reveal where value is hidden in a vast data ecosystem.

Meaningful and effective data discovery comes down to one thing: metadata. Metadata isn’t just “data about your data.” Metadata can break down data silos, move decision-makers to be more data-driven, and eliminate unnecessary meetings.

For example, data discovery discussions often occur in face-to-face conversations. Someone says, “I need this type of data. Do you know where I can get it?” The other person says, “I think so. Let me check.” This process can take anywhere from minutes to weeks, and the final answer could be anything from “yes” to “no” to “I don’t know” to “we have it, but….”

More mature companies are more streamlined, but most companies aren’t. It can take months, or even years, for substantive use of metadata-driven data discovery to shift an org’s culture. Even at data-mature enterprises, data can be locked into lines of business (or some equivalent vertical silo) and unlocking it can be painful. Those enterprises may have the tools but don’t have the processes or policies they’d need to share data. 

Well-maintained, accurate, and thorough metadata can solve these problems quickly and efficiently. A metadata-driven discovery system can end the inefficient ask-and-see-if-it’s-available discussions that are so commonplace. Not only can someone see if the data is available, but solutions like a catalog or a marketplace can immediately provide a way of accessing it.

How and why data discovery efforts work

When organizations become mature and capable enough to go down the path of data discovery, they usually do so in one of these two ways:

  1. Gather data from various sources to augment internal data. The process begins with extracting data from multiple sources and consolidating, classifying, and organizing the retrieved data into a single area for evaluation.
  2. Compare and contrast outside data to internal data. Organizations can get a holistic view of their business performance by comparing it to an entire industry or vertical rather than only internal information.

The data discovery process provides complete visibility into a company’s data. Next, they can determine how to apply the appropriate levels of security and privacy across their data ecosystem. Organizations can then manage regulatory and legal compliance like GDPR and CCPA much easier and more effectively.

Implementing a successful data discovery practice can help a company understand how data moves across the organization and provide clarity on curation, governance, and data accessibility.

Discovery projects can be completed manually or using automation. The manual effort requires humans with data expertise who understand the contexts in which data are used. The automated step depends heavily on AI and ML to gather, interpret, and present actionable insights.

Metadata management in a marketplace

A metadata-focused data marketplace can be an excellent metadata solution for vast amounts of data and data products. Though data catalogs have made more headway as metadata management platforms for big datasets, some marketplaces (like Revelate’s) offer powerful, advanced metadata tools that enable data discovery.

In a marketplace context, there are two types, or levels, of metadata:

  • Macro metadata, which describes the data product.
  • Micro metadata, which relates to the data itself.

Metadata can be assigned automatically by the platform, via automation, or by a product administrator. Metadata can be shared across product categories and individual products. We believe metadata should be descriptive, but providers can create metadata that is non-descriptive.

Metadata can be used in various ways but is particularly useful in discovery situations. For example, a consumer searching for historical stock market data might enter a query for “1974 commodity stock pricing.” That search query looks across the marketplace for metadata that closely matches the request and presents related products to the consumer.

In most systems, metadata is stored as key-value pairs. The Revelate marketplace has no limits to the metadata providers can assign to products, categories, and data. It can be as deep and complex as a provider wishes. The metadata framework is open, so providers can create whatever fields they want.

Metadata features in the marketplace

Data providers primarily use metadata capabilities in a marketplace for defining and configuring data products. As such, a mature marketplace platform should support a comprehensive set of metadata management features like:

  • Metadata replication: The ability to reuse metadata across multiple products and categories.
  • APIs: Get, set, modify, and delete metadata in an automated way.
  • Workflows: Apply metadata in automated product manufacturing processes.
  • Import capabilities: Use spreadsheets or other files to manipulate metadata, like bulk imports for multiple products.
  • Supported data types: Use any data in product and category metadata fields.
  • Schema enforcement: How strict or flexible metadata is applied in product and category configurations.

 

Marketplaces are not data catalogs or full-fledged metadata management systems (those are dedicated products with their own niche markets). However, there is plenty for a data provider to do with a sufficiently large set of metadata capabilities in a marketplace.

Metadata Affects Revenue Generation

You will not make money if people can’t find and discover your data product. It’s the same problem with search engine optimization, keyword targeting, and other marketing technologies that can be so difficult to pin down.

A great data product is great, but it often isn’t enough. Just as we discussed in the data productization chapter, a great data product needs to use metadata capabilities to make it discoverable. If someone searches for “1974 stock market data” and there’s nothing in the product metadata using similar keywords, then that consumer won’t be purchasing your stock market data product—even if it provides precisely what the consumer needs.

It’s great to have data products in a data marketplace, but the metadata gets the product in front of people’s eyes. What good is data de-siloing if no one can find or use the data anyway?

Takeaways

  • Data discovery is the first step on the data acquisition journey.
  • Discoverability is enabled by metadata attached to the data product.
  • A successful data discovery practice can help a company understand how data moves across the organization and provide clarity on curation, governance, and data accessibility.
  • In a marketplace context, there are two types, or levels, of metadata: macro metadata, which describes the data product, and micro metadata, which relates to the data itself.
  • Data providers primarily use metadata capabilities in a marketplace for defining and configuring data products.
  • Data product discoverability has a direct effect on the revenue-generating capability of a product. If consumers can’t find the product, they won’t buy it.