Dark Data at the Enterprise Level: What is it and What Risks Does it Pose?

The term ‘Big Data’ is often referenced colloquially throughout the world of digital information technology; its value stems from the fact that it provides fundamental insights into consumer, market, and business dynamics. These insights are typically revealed through the implementation of AI-driven classification and predictive algorithms, which allow businesses to monetize their datasets and streamline collaborative data processing practices through multi-cloud environments that span several business domains. 

While this appears relatively straightforward in practice, Big Data is expanding exponentially. As an effect, it is particularly difficult for enterprises to disseminate the massive amounts of data they acquire; one of the most common problems companies encounter is data silos, which are digital repositories of information that remain isolated to specific business domains throughout an enterprise. 

These silos hinder an organization’s ability to operationalize and quantify its data for the purpose of data-driven decision-making. Additionally, they raise a variety of other concerns, namely an enterprise’s ability to comply with the relevant regulatory frameworks, mitigate cybersecurity risks, streamline the use of multi-cloud environments and collaborative data processing approaches, and gain valuable insight into consumer and market behavior

Unfortunately, Big Data, and the problems it generates at the enterprise level, are just the tip of the iceberg. This is where Dark Data enters the scene; it encapsulates everything that lies beneath the surface. Importantly, Dark Data is technically distinct from Big Data; it is data harbored by an organization that does not yet have any monetary value, practical use, or coherent structure. In fact, the State of Dark Data report, sponsored by Splunk, revealed that globally, approximately 55% of all data obtained by businesses is dark. 

Dark Data, in and of itself, is not useless. On the contrary, seeing as imperatives to build data-driven business models are becoming increasingly important, the ability to analyze and process Dark Data is paramount, especially if companies seek to generate a comparative market advantage for themselves while maintaining the integrity and security of their data infrastructure

Let us now explore some of the ways in which an enterprise can mitigate the security and regulatory risks posed by untapped reserves of Dark Data as well as the kinds of steps a business could take to increase its abilities to analyze and process said data.

What are the Risks? 

As more businesses adopt data fabric approaches to data infrastructure, which function through collaborative multi-cloud environments and Machine Learning analytics, their abilities to make sense of and organize their data will be challenged due to increasing complexity. This is troubling when considering that more than 90% of enterprises have now fully implemented multi-cloud structures into their business model. 

While multi-cloud platforms allow companies to build scalable data storage and acquisition infrastructure, they also generate a number of security risks. These risks can be moderately reduced using asset discovery tools, which help track various new software licenses and downloads as well as novel DevOps instruments, all of which are consolidated into a database. However, given the complexity and pace of digital transformation, it is virtually impossible for companies to keep track of all their digital assets, especially those that are not used frequently. 

Some of the tools companies might employ to monitor cloud inventory are Azure Security Center, AWS Systems Manager, as well as IBM, Google, and Oracle Cloud. The functions of these tools vary from tracking inventory and product management at several locations both on and off-site to insurance claim assessment, fraud prevention, and digital asset management

There are, however, some drawbacks to these tools that are worth considering, especially within the context of Dark Data. First, when a company decides to switch cloud service providers to streamline digital transformation, they can use cloud inventory tools to pinpoint and discover native digital assets that they want to integrate into their data infrastructure throughout the transition. However, cloud inventory tools cannot ‘see’ into multiple cloud environments – when data is moved from one cloud to another, there is a high likelihood that non-native digital assets will be lost, effectively becoming Dark Data. 

As an effect, large amounts of unused enterprise data will remain strewn throughout several cloud infrastructures and legal districts (depending on where data servers are located), even after an organization has made the switch from one platform to another. This results in a convoluted representation of a given business’ data footprint, which actively leads to concerns surrounding data security and privacy as well as regulatory compliance. 

For instance, sensitive consumer information regarding financial aptitude, behavioral characteristics, or medical data, could be compromised in posthumous security breaches after enterprise cloud services have been changed (cloud service providers usually notify enterprises of security breaches – if providers have changed then businesses have no way of knowing whether the remnants of their data in previous cloud servers are secure). From the regulatory standpoint, it is also extremely difficult for lawmakers to ensure that data harvesting practices adhere to legal frameworks because they cannot track the flow and acquisition of data accurately across multi-cloud platforms. 

How Should Enterprises Mitigate these Risks?

Fortunately, there are a few approaches an enterprise could take to mitigate the risks of Dark Data while also ensuring it is properly used when there is revenue potential. In the same State of Dark Data report mentioned earlier, it was additionally revealed that approximately 81% of individuals working in data-driven corporate environments believe that achieving a senior-level position is heavily influenced by one’s data literacy. This statistic highlights how important data science is at the enterprise scale and underscores the vitality of understanding digital information exchange. 

The primary issue with Dark Data is being able to distinguish it from native digital assets in the cloud. There are a few ways in which this issue can be resolved:

  1. Building data intelligence; by mapping out the distribution of data throughout unstructured and structured datasets, enterprises can streamline their abilities to identify and label relevant data points. They can also use AI-driven automation systems to organize their data into asset catalogs. 
  2. Ensuring the business is data-driven; by actively building a corporate culture that values data science and skills, enterprises can increase their data literacy. This can be achieved by providing corporate-sponsored training programs in addition to recruitment outreach that targets highly-skilled data scientists.  
  3. Building data infrastructure that complies with regulation; this involves identifying which data belong to which individuals and subsequently mapping out these relationships at the enterprise scale. This would allow enterprises to increase the efficiency and productivity of their business models while also respecting consumers’ rights to consent to data acquisition and request the release of personal data at will (under regulatory frameworks like the GDPR). 

As our digital information systems become more intelligent and embedded in everyday life, for instance through the integration of IoT technology, so will their abilities to harvest and analyze data. The exponential growth of data, while it will eventually plateau, will not do so anytime soon; this means that whether we like it or not, all businesses, regardless of industry, will necessarily become data-driven

If enterprises want to keep up with the expansion of Big Data in ways that generate profit but also respect consumers and regulatory frameworks, they must begin implementing new data infrastructures or practices that can both identify and distinguish Dark Data at scale while maintaining native digital assets.

Contributor

Sasha is currently pursuing an MSc in Bioethics at King’s College, London. Prior to engaging in his current studies, Sasha was a Division 1 Ski Racer at Bates College, where he graduated with a Bachelor’s in Cognitive Psychology and Classical Philosophy. He is deeply interested in applied ethics, specifically with respect to AI-driven exponential technologies and how they might one day affect humanity

About Sasha Cadariu

Sasha is currently pursuing an MSc in Bioethics at King’s College, London. Prior to engaging in his current studies, Sasha was a Division 1 Ski Racer at Bates College, where he graduated with a Bachelor’s in Cognitive Psychology and Classical Philosophy. He is deeply interested in applied ethics, specifically with respect to AI-driven exponential technologies and how they might one day affect humanity

View all posts by Sasha Cadariu →