Building an Internal Data Marketplace with Snowflake and Revelate

Revelate

Table Of Contents

A big idea like “internal data sharing” can feel deceptively simple. Organization-wide information exchange empowers teams to discover new insights and create new efficiencies. There are tons of tools available to do this; how hard can it be? Despite the available tools and platforms, businesses struggle to select, implement, and benefit from such tools. 

Internal data sharing allows users to share data resources across and among multiple business functions, departments, users, applications, or geographies. Organizations know their data is tremendously valuable, but often existing technologies, company culture, and lack of effective data management practices are significant barriers in implementing internal data sharing.

What Is an Internal Data Marketplace?

An internal data marketplace (IDM) is an online platform that makes both the organization’s data sets and any third-party external data assets available to internal users. IDMs offer both data providers and internal users a single platform to collaborate and create around data. Internal platforms enable teams to streamline their data catalog and make data assets available while also providing controlled access based on user roles and permissions. 

IDMs also:

  • Offer teams a unified and consistent platform for cataloging, categorizing, and making data accessible
  • Provide stakeholders a complete view of data available across applications, clouds, databases, and data centers
  • Enforce the consistent quality and interoperability standards required to speed time to insight
  • Enable compliance and security teams to manage security and data governance

Internal data marketplace use cases

Internal marketplaces can serve many functions. Here are just a few examples of what kinds of data IDMs hold and use cases for sharing such data:

  1. Product development: Internal data accessed through a marketplace can provide teams with access to customer feedback, usage data, and market research. This can help product teams validate assumptions, gather insights, and drive meaningful, iterative improvements to their products and services. 
  2. Data-driven decision-making: Data that provides insights into operational efficiencies, employee performance, and market research can assist executives in making strategic decisions about new markets, operational improvements, and staffing methods.
  3. Business analytics: An internal data marketplace can enable teams to access and analyze data for business intelligence. Marketing teams might access customer data to analyze trends and inform future marketing decisions, and finance teams might access sales data to enhance budgeting, forecasting, and profitability analysis.

Elements of an internal marketplace

An internal data marketplace formalizes standards, controls, and “buying and selling” workflows across the organization. Internal data marketplaces typically have the same core elements, which include: 

  1. One or more unifying schema that organizes metadata to contextualize how data is organized and defined
  2. Business context around data sets that establishes why the organization or producer thinks it might be helpful
  3. A global data catalog that captures all available data and available metadata
  4. A mechanism for discovering, ordering, and fulfilling specific data objects for consumers
  5. A framework for maintaining unified visibility and control over information and consumers

A  IDM allows teams to continue to work inside data environments that have been carefully optimized for their technical and business needs. This level of abstraction is always important: convenience can’t ever eclipse compliance.

 

The Benefits of Snowflake as an Internal Data Marketplace

From product usage statistics to network traffic data to hardware energy consumption, businesses are accumulating unprecedented amounts of data. Traditional structured data tooling and databases have given way to new solutions designed to help unlock insights from semi- and unstructured data. An internal data marketplace must accommodate all these data types, along with their associated solutions stack.

What is Snowflake?

Snowflake is a cloud-based data warehousing platform that is designed to handle large-scale data storage and analytics. Its architecture is highly scalable and flexible, making it ideal for storing, processing, and analyzing data. Some key characteristics of Snowflake that make it an attractive choice include:

  • Cloud-native architecture that allows for automatic scalability, elasticity, and high availability
  • Robust features to ensure data security and governance
  • Support for diverse data types
  • Integration with common business intelligence tools, ETL/WLT platforms, data pipelines, and machine learning frameworks

The need to balance performance with dynamic flexibility is one reason organizations consider Snowflake’s data-as-a-service platform an ideal IDM. The Snowflake Data Cloud lets teams build a consolidated data layer across previously disconnected data sources. They can then deliver that centralized view to internal consumers across the organization.

Let’s explore some of the other benefits of Snowflake.

Easy (and Economic) Scalability

IDMs must grow with demand. Snowflake is relatively easy to scale up or out as needs evolve. From a service design perspective, customers can either resize existing instances (data warehouses) or bring new compute nodes online to boost concurrency or performance. Additionally, the Snowflake infrastructure separates storage and compute, giving businesses the ability to quickly and precisely scale one or the other as the business requires.

For example, businesses want to run a series of complex analytics queries on their sales data to gain insights into customer behavior. These queries require significant computational resources. In Snowflake they can create a virtual data warehouse built specifically for this purpose. The storage layer will efficiently store and manage data, while the compute layer will allocate resources. When the organization submits an analytics query, the compute layer retrieves data from the storage layer and computes resource allocation within the virtual data warehouse.

This enables businesses to independently scale components based on their needs and scale up compute resources by adding more virtual warehouses. Parallel query execution also optimizes performance and helps ensure the company does not pay for unused resources.

Essential Flexibility

Marketplaces must accommodate all kinds of buyers and sellers. Snowflake consolidates data sources, including OLTP databases, web apps, logging, and M2M/IoT data sets, with support for ETL, ELT, and/or streaming. The cloud-agnostic design means you can build across multiple instances, leveraging all the cloud-side tools and services a modern CDP might require.

Robust Performance

Marketplaces must be open for business when it counts. Snowflake gives organizations the ability to run a nearly limitless number of concurrent workloads—imagine a busy reservation or transaction system. This ability is especially important when designing for the mission-critical databases and data applications enterprises increasingly rely on to get and stay competitive.

Integrated Security

Marketplaces must never put collective assets at risk. Snowflake enables secure sharing between data sources and consumers by embedding granular reporting and controls into the infrastructure at every level. This feature means security and compliance teams can manage role-based access, policy, auditing, and compliance certifications.

What Holds Snowflake IDMs back

The ability to create a consistent, comprehensive view of data across the organization is fundamental to Snowflake’s ability to help organizations build and manage their internal data marketplaces. While its unique infrastructure and highly customizable applications are an ideal technical fit, there are limitations to consider.

Snowflake-Only Data Sharing

One obvious limitation to deploying standalone Snowflake as an IDM is the limitations the platform imposes. If, for whatever reason, an internal organization or partner isn’t building on Snowflake, bringing data to market gets much more complicated. Differences in data formats, data quality, and data synchronization can require complicated conversions, integrations, and inconsistent data. It also logically means stakeholders must be Snowflake users to take advantage.

Integrations Required

Extending the Snowflake to third-party data providers and services relies on integration. Stakeholders must ensure new partners and sources are “Snowflake-enabled” to ensure they can build their IDM as expected.

Unpredictable Economics

Snowflake’s data-as-a-service model can cause unexpected, unpredictable costs.  This can happen when a data warehouse scales and workload prices increase, or when unexpected data transfer costs emerge.

Organizations must ensure proper oversight of IDM operations to ensure they are using services efficiently and effectively, especially given how easy it is to scale capacity.

Persistent Complexity

Snowflake is designed as a service—consumers just pay and go. However, there’s still a lot of work that goes into building the initial Snowflake architecture, from building warehouses to managing metadata and the catalog. The same goes for building the initial Snowflake queries. It’s a lot of intervention for organizations looking to build a mostly friction-free IDM.

Narrow Customization

The catalog is the data marketplace’s unifying artifact. The ability to clearly describe data on offer is important, especially when trying to monetize inside a formal marketplace. Snowflake’s metadata capabilities don’t make it easy for organizations to achieve the last-inch customization and optimization required for “retail” data sharing.

A Better Stack: Supercharge Your Snowflake IDM with Revelate

Luckily, organizations working towards a formalized, Snowflake-driven IDM have a solid option for solving some of the gaps and shortfalls we discuss above. Adding Revelate to the mix allows organizations to leverage Snowflake’s core data sharing capabilities while building an IDM that extends its reach. This approach also means enterprise-ready data product and monetization opportunities are closer than ever.

Where Snowflake excels at data aggregation and operations, Revelate is precisely focused on what it takes to create compelling data products internal consumers will value. This focus means helping the organization look beyond the limits of the Snowflake environment to ensure your internal marketplace is always as well-stocked as possible.

Combining Revelate and Snowflake will let you:

  • Expand your catalog and reach new consumers by including sources and users outside your Snowflake environment, and even enable intra-account sharing not natively possible
  • Offer truly self-service data discovery and procurement experiences for internal data consumers
  • Build a culture of data sharing across the organization with Revelate’s collaboration features, which let users share real insights and ideas, not just raw data
  • Customize metadata tagging and display to craft more effective data presentations and offer customized, easy-to-buy data products to consumers across the organization
  • Strengthen data governance and oversight by layering Revelate’s authorization and entitlement tools on top of Snowflake’s granular access and permissions tools
  • Demonstrate data lineage and compliance with easy-to-understand Revelate data reporting focused on data product quality

Finally—and this benefit is huge—Relevate moves you towards greater adoption of data sharing as a core business advantage by enabling simple, speedy self-service data consumer experiences. Find the data you need, pay for it in your favorite e-comm platform, and get to work.

Beyond the Tech: Best Practices Make All the Difference

No matter exactly what your total tech stack looks like, mature best practices can go a long way toward making your Snowflake data marketplace successful. As with any successful B2B data marketplace, these principles are mostly about keeping producers incentivized and customers happy.

Metadata Defines the World

Initial metadata schema defines the known data world. It helps teams tag and organize their data in ways that make sense to their business needs. As those needs evolve, the metadata must keep up.  This all makes that first metadata mapping very important.

Bad Products Don’t Create Return Customers

Optimize for quality and accuracy above all else. This process is critical to the success of any internal data marketplace. Consumers must see data as a resource they can trust and act on.

Buzz Matters

As with all tech initiatives, internal data marketplaces are only effective if they’re widely adopted. An internal marketplace rollout should be just like any other product launch: the right people must know and care.  

This means the same traditional stakeholders you’d need for any data conversation, but can also include product and innovation teams who might benefit from the new marketplace.

What’s coming next?

Even as data piles up at an outlandish pace, technological advances aren’t slowing down. Fields like machine learning and artificial intelligence continue to transform on a near-weekly basis, and all these advances will impact how you build and scale your internal data marketplace.

The recent announcements on AI, ChatGPT, and LLM are showing us the edges of what an AI-assisted data marketplace might look like, with highly customized recommendations and deep insights available at the ready.

We’re also seeing continuous improvements achieved inside DataOps, as providers find smarter ways to get raw data ready for sharing and consumption. Organizations will expect these advantages to show up in their data platforms and marketplaces in the near future.

Finally, the *aaS paradigm might eventually include data marketplaces as a service, where more and more of the necessary infrastructure and applications are delivered from the cloud. A similar DataMesh paradigm looks to simplify how organizations make, manage, and share data products. Both moves are aimed at the same goal: making data exchange very easy, inside or outside your organization.

Whether it’s a new data source or a new strategy for secure sharing, the future of data is forever in motion. Every single step forward will have an eventual impact on your ability to build and scale the internal data marketplaces your organization can use to continuously move from insight to opportunity—and Revelate is ready to help.

Unlock Your Data's Potential with Revelate

Revelate provides a suite of capabilities for data sharing and data commercialization for our customers to fully realize the value of their data. Harness the power of your data today!

Get Started