Big Data and Blockchain

Cryptocurrencies on the Internet

Crypto currencies or virtual currencies (VC) are digital representations of value that can be transferred, stored, or traded electronically and that is neither issued by a central bank or public authority, but is accepted by people as a means of payment. VCs are designed to be optimized for digital networks while being user-friendly, cost-effective and verifiable.

The most popular Virtual currency to date is Bitcoin1 that relies on the concept of Blockchain, a distributed and shared ledger technology in which all transactions are securely recorded, thus allowing any participant in a business network to see and check the validity of a transaction. High Performance Computing (HPC) platforms and Big Data paradigms are meant to play a key role in VC and DtoK Lab is active on those technology fields.

Distributed ledger is a technology that enables the distribution of the “ledger” in all the nodes in the network by putting the transactions into blocks and linking the blocks by cryptographic mechanisms. Being created this way, it is also called the Blockchain. As a distributed system with operating on sharing principles, consensus mechanisms are introduced to facilitate the voting schemes to decide the acting node to update on the content of the Blockchain. Application key areas of distributed ledger technology are: Smart contracts, KYC and AML, Deposits and Lending, Capital raising, Investment management, Payments, Insurance, Market provisioning, and Crypto currencies.


Big Data technology and Blockchain 

Big data refers to massive and heterogeneous digital content difficult to process using traditional data management tools and techniques. The term includes the complexity and variety of data and data types, real-time data collection and processing needs, and the value that can be obtained by smart analytics.

Financial services and Blockchain, in particular, may benefit from the use of big data analysis. In fact, using data analysis strategies, as those developed by DtoK Lab, on Blockchains it allows to identify trends, models and threats through the data produced and exchanged.

Big data mining applications can run pattern recognition tasks from thousands to millions of Blockchain interactions to identify evil users and vicious uses. At the same time, Blockchain data can be clustered and classified to assess the trustiness of banks, operators and financial services.The distributed analysis algorithms that can be used in Blockchain analysis may represent a significant added value through which knowledge benefits can be extracted from the large amount of blocks available. In particular, distributed techniques and models for data stream mining must be investigated and evaluated for real-time Blockchain analysis. This work will also bring benefits to digital payments systems that are going to be deployed in the near future.

Big data analytics must be used to identify fraudulent operations on bitcoins and, more generally, fraudulent use of currencies2. These solutions are going to be more and more important and they are orthogonal to the use of security mechanisms, since analysis techniques can identify vicious behaviors that are composed of single operations apparently correct and secure. Being able to identify people, companies, users who are suspicious to have committed or to commit frauds is vital to make Blockchain and financial transactions secure and legal and help people to trust on them.

Artificial intelligence applications for Blockchain

The application areas of AI and machine learning in distributed ledger management and Blockchains are many and all important. The first one to mention threats identification and fraud prevention. These issues can be addressed through the use of pattern mining, text mining and outlier detection algorithms. All those technical solutions are already used in finance and banks, so they already showed their effectiveness in the financial domain.

 However, they need to be adapted and advanced to be effective and efficient in distributed Blockchain platforms. Another application area is real-time decisions. This task needs scalable platforms and algorithms. For distributed real-time analysis, data stream learning algorithms and systems can be used. Their execution of HPC and/or Cloud platforms may offer the needed performance and scalability to provide responses at the given deadlines.

Today it is not sufficient to use information systems to manage governance, risk and compliance activities, the next step is leveraging and analyzing big data for assuring that an organization meets its objectives. In the financial area is also more critical, thus the use of scalable data analysis strategies will bring added value and allows for improving governance, reducing risks and assuring compliance.

Other important applications areas where the use of big data analysis and learning algorithms can bring benefits are customer insight gain, pricing optimization, and operational efficiency improvement and management cost reduction3.



  1. Nakamoto. Bitcoin: A Peer-to-Peer Electronic Cash System. 2009.
  2. Baron, A. O’Mahony, D. Manheim, C. Dion-Schwarz. National Security Implications of Virtual Currency – Examining the Potential for Non-state Actor Deployment. RAND National Defense Research Institute, 2015
  3. A. Kroll, I.C. Davey, and E.W. Felten. The Economics of Bitcoin Mining or, Bitcoin in the Presence of Adversaries, In WEIS 2013, 2013