Data Procurement

Data Procurement: How Pricing and Distribution Methodology Are Inextricably Linked


Data Procurement: How Pricing and Distribution Methodology Are Inextricably Linked

Continuing our discussion of the data buying and selling process, in this blog post we will start to look at Procurement, an umbrella topic that includes billing, licensing and audit. In this blog, we’ll look at billing, and specifically at approaches to setting a viable price for data services and how that may be influenced by the design characteristics and delivery methodology employed.

The Procurement process is a multifaceted one, involving many moving parts, making it perhaps the most nuanced stage in the data buying journey. And for data sellers, pricing may be the trickiest challenge within this stage of the commercial workflow.

Just as the delivery options available to the data seller are dependent on the characteristics of the dataset, the pricing may be similarly influenced by at least some of the five Vs of Big Data, in this case, volume, velocity and some notion of value. Value may be determined by a number of factors, including cleanliness, accuracy, frequency and granularity, as well as any value-added insights the data contains. Data consumers may also consider the credibility of the data supplier, the commonality of the data product and its relative importance with respect to the purpose it will be put to.

Whatever the perceived value, though, in many cases, the distribution model selected by the seller on the basis of the characteristics of the dataset in question may have ramifications for the range of pricing options available to the seller. In short, the pricing model that can be employed may be closely tied to the Distribution models discussed in our last blog, whether it’s a one-time historical, subscription, on-demand or services model, each has its own implications for pricing.

Take the one-time historical access and subscription delivery models, for instance. Both give ‘ownership’ – or at least control – of the data to the client or buyer, offering no visibility to the data producer in terms of what the client does with the data they receive. As a result, billing options for sellers are limited; they are unable to offer consumers pay as you go option, for example.

With these delivery models, the value of the dataset to the consumer depends entirely on the value perceived by the buyer and as a result is difficult for the seller to ascertain. In these cases, the data seller may have to employ traditional processes such as interviewing clients or target prospects in order to understand the value of the data to the client and identify the optimal price for the service.

Another consideration in these instances is for the seller to assign a price based on the volume or throughput of data being provided. This may not be as straightforward as it first appears, however. It’s a truism in data sales that sellers derive less revenue per row of data as the size of the dataset on offer gets larger. Because of this inverse correlation between the volume and the incremental price of data, each additional row needs to bring more value to the consumer if the seller is to be able to charge more for it.

This can make big data sets challenging for the seller. There are further size implications for the provider too. The seller may have to store terabytes or even petabytes of data, and this adds to the cost of providing the service and doesn’t necessarily mean the seller will make money back on this investment. So the infrastructure cost is a significant factor in price determination and indeed assessing the viability of a data service.

Now let’s look at the on-demand delivery model. Here, the onus for delivery is on the provider’s side, giving the seller more control and opening up options for different pricing models. Using on-demand, the provider is able to generate insights into buyers’ data usage: which client teams use it, when they connect, which data elements they access, and so on. From this, the provider is able to build models and develop pay as you use pricing formulas, whether basing the cost on a number of rows of data scanned, the volume of data downloaded or flat-file requests.

But capturing the metrics required to make sense of these usage patterns can represent a challenge in itself. When a consumer connects to the host server, it’s usually evident and easily identified. But to monitor data transfers, providers need access to networking logs, potentially across multiple users. To take full advantage of the on-demand delivery model, data sellers often need help to set up and operate the technology platforms needed.

Another consideration here is the fact that many data consumers may not want their suppliers to know what they are doing with the data, since this may represent a ‘secret sauce’ element of their overall business. Usage-based models can impair data owners’ ability to sell the data in the first place, as buyers are reticent about allowing sellers to understand their activities.

This quandary illustrates the need for buyers and sellers to maintain a non-adversarial relationship. Sellers need buyers to trust that they will exercise discretion around their data usage. But this works both ways: Sellers also need to trust that buyers will be a good actor with respect to usage of their data, with full compliance with terms of data licences.

Buyers of on-demand services that are priced on a usage basis need some level of predictability of cost. No-one wants a call from the CFO surprised at unexpectedly high invoices for on-demand data services. Buyers – and by extension sellers – need to be aware of the dangers of unrestricted usage and ringfence access accordingly.

Finally, for services model delivery, pricing maybe even complex, particularly where providers offer a ‘sandbox’ facility to allow prospective buyers to assess and test their data wares. Sandbox pricing may involve several tiers: the subscription fee for access to sandbox itself; fees for data sets consumed in the sandbox; and some kind of charge-back for the infrastructure and technology needed to support the sandbox. This all can complicate the pricing challenge, to the extent of raising the question: is it worth it?

One way of simplifying the pricing issue is for sellers to take a more granular approach to package the data, which can be an effective way of reducing the barrier to entry for a greater number of potential buyers. By offering a lower per-unit price, smaller buyers may be able to buy exactly what they need. This approach may also bring more incremental data purchases within reach of buyers restricted by budget ceilings for which they are authorized.

In capital markets, for instance, rather than offering a full feed of market data, a supplier could offer packages defined, say, by market depth by symbol or by asset class. Through correct pricing, the seller could open up demand for these smaller packages while continuing to sell the full package at full price to large and more sophisticated buyers.

A final consideration for sellers is the common tendency, especially in capital markets, for larger organisations to buy the same data sets more than once, often due to different lines of business being unaware of their peers’ activities. Sellers who have made a business from this state of affairs may soon find that technology is making it easier for buyers to keep track of the data sets they are acquiring and using. These suppliers need to review their processes, and take measures like developing more granular data catalogs, to make their data sets more accessible to smaller buyers and consumer teams.