Data marketplaces for decentralized machine learning

Autonomous cars need a lot of training data. However, because one manufacturer cannot accumulate all the training data on its own (it is estimated that vehicles need one trillion road miles of training data before they can successfully drive autonomously), autonomous cars might, among other things, suffer a geographical bias towards the area where they are manufactured and trained. A car manufactured and trained in the USA might have troubles driving on Asian or European roads. One solution to this problem is to combine the data from all cars from all around the world. This is where data marketplaces come into play.

Data marketplaces: platforms for exchanging data

Data marketplaces are platforms that manage the exchange of data between publishers and subscribers. Publishers are people or devices that submit data. Subscribers, which can also be people or devices, use the submitted data. Publishing as well as subscribing can happen in real-time and be monetized. Naturally, the devices submit and use the data autonomously and people (e.g. developers) use the data in their applications.
Whereas all kinds of data can be accessed through data marketplaces (health data, social media data,  or prices of different assets), most relevant for self-driving cars are vehicle information (a vehicle’s mechanics, identity, geolocation…), driving behavior (acceleration, speed…), and road information (traffic, road quality…).

Overview of blockchain-based data marketplaces

Name Description Fundings
(only ICO)
PikcioChain Join PikcioChain to become part of our secure, compliant and distributed data ecosystem that enables the collection & exchange of personal data. (Source: PikcioChain is a fully distributed information… $10.000.000
RepuX RepuX Decentralized Data Marketplace – ERC-20 Token Sale Token Sale Live. Buy REPUX ERC-20 Tokens at discounted price. Read white paper, roadmap & team. Decentralized data sharing marketplace & data… $4.700.000
Ocean Protocol A Decentralized Data Exchange Protocol to Unlock Data for AI (Source: Website) $22.100.000
Sense THE PROTOCOL FOR HUMAN KNOWLEDGE People still trade time for money, and our moral imperative is to change that Introducing the first decentralized human information marketplace, powered by the SENSE… $21.000.000
Measurable Data Token Decentralized Data Exchange Economy (Source: Website)
Datum Unlock the $120 Billion data economy. Datum is the decentralized marketplace for social and IoT data. Powered by Ethereum, BigchainDB and IPFS. (Source: $8.738.305
Streamr DATAcoin Unstoppable Data for Unstoppable Apps Streamr tokenizes streaming data to enable a new way for machines & people to trade it on a decentralised P2P network. (Source: Website) $27.000.000
weeve Empowering The Economy of Things Weeve is a global network of IoT devices autonomously buying and selling their data. Powered by next-generation cryptography, open source hardware and secured by the…

Using Streamr as an example it is shown below how the high-level technical structure of a blockchain-base data marketplaces might look like. Furthermore, based on the The Ocean Protocol it is shown how data marketplaces can use blockchain-based design features to implement quality standards.


Streamr is a Switzerland-based company and their data marketplace consists of five components:

  • Streamr Editor: It is used for developing decentralized apps (dapps)
Streamr Editor
Streamr Editor (Source)
  • Streamr Data Market: The data marketplace itself where publishers and subscribers meet. Providers are incentivized for contribution and subscribers might have to pay for the data, depending on what kind of data it is. As the term „subscribers“ suggests, users do not own the data they utilize but have the right to use it. The logic underlying the Data Market is implemented in Streamr’s Smart Contracts. Their data marketplace manages data from security exchanges, connected devices, IoT sensors, and social media.
    Streamr Data Market
    Streamr Data Market (Source: Website)
  • Streamr Engine: The analytics platform that processes the available data so that it can be used by dapps or smart contracts. The Engine listens for changes on the Streamr Network and processes the incoming data using off-chain analytics.
  • Streamr Network: This is Streamr’s transportation infrastructure which consists of nodes (called Brokers in Streamr), Smart Contracts, and Streamr Clients (data publishers – those that contribute data – and data subscribers – those that use the data such as dapps). The Broker node is the software client that, among other things, subscribers to or publishes data. The Streamr Network coordinates the transportation of information between publishers, the Streamr Engine, and subscribers. The logic underlying the Streamr Network such as the autonomic processing of new incoming data or the contribution to all relevant subscribers is implemented in Streamr’s Smart Contracts.
Streamr Network
Streamr Network (Source)
  • Streamr Smart Contracts: These smart contracts hold the logic for processes that belong to everything done by the Data Market and Network such as data contribution, access, incentivization, validation or subscription management (in the case of paid data).

The Ocean Protocol: Token-curated registries for verification and quality assurance

The Ocean Protocol, a protocol and network, is working on a marketplace for AI services and AI-data (data that can be used for AI-related tasks). The Ocean Protocol is worth mentioning because they have put in place a blockchain-enabled verification and quality assurance processes using a so-called token-curated registry.

Token-curated registries: blockchain-based mechanism to curate lists

A token-curated registry (TCR) is, as the name suggests, a curated registry („list“) that contains items that belong to one specific group and that satisfy some criteria. For example, in a token-curated registry of dog-friendly hair saloons, the group is “hair salons“ and the criteria is „dog-friendly“. The way these token-curated registries work is that all members of a token-curated registry are incentivized to create the best list possible. The members are candidates (those that want to be on the list), members (those that are on the list), consumers (those that use the list), token holders (those that decide by voting whether a member stays a member and whether a candidate becomes a member). As mentioned, those participants are incentivized; members are candidates are fined if they are fraudulent (e.g. when they applied to be on the list but are not dog-friendly anymore), and token holders are rewarded when they maintain the list (e.g. when they background-check an applicant to ensure if it is really dog-friendly).
The Ocean Protocol uses token-curated registries in two ways. Firstly, to simulate KYC (know-your-customer) processes to whitelist only good publishers and secondly to validate owner rights.

The Ocean Protocol’s token-curated registry for whitelisting publishers

In order to have a high-quality marketplace, the providers must be trustworthy. The Ocean Protocol wants to ensure this through a whitelisted registry of good members. Somebody can only be whitelisted if she makes a financial deposit and if none of The Ocean Protocol’s token holders consider her fraudulent (by voting against here)

Concretely: Everybody who wants to become a publisher must make a financial deposit for a so-called trial or challenge period. If nobody considers her fraudulent during this trial or challenge period, she is whitelisted and keeps her deposit. However, if somebody believes the publisher to be fraudulent, this critic that mistrusts the publisher starts a so-called a challenge (i.e. the applicant is „challenged“). During this challenge, token holders can vote and the applicants status and deposit depend on the outcome of the poll. If the majority votes in favor of the publisher (i.e. when they consider her legit), she is whitelisted and keeps her deposit. Else, she is not added and loses here deposit.

The Ocean Protocol’s token-curated registry for validating user rights

The other area where The Ocean Protocol uses a token-curated registry is the validation of owner rights. To ensure that publishes only submit data that they are allowed to submit, a system for checking owner rights must be put in place. The process is very similar to The Ocean Protocol’s voting mechanism mentioned above; when a publisher submits data, she must make a financial deposit during the challenge period. If nobody challenges the owner rights during this period, the publisher is allowed to submit the data. However, if a challenge is raised, the publisher’s deposit and reward (the publisher is rewarded if somebody uses her data) depend on the challenge’s outcome; if the majority votes for the publisher, she keeps her deposit and can submit the data. Else, she loses her deposit and is not prohibited from submitting the data. Moreover, the publisher is removed from the „whitelisted publishers“ registry.



Sign up for Newsletter: