Delving into Dark Data on Azure – Data Governance in the Cloud

For most organizations, dark data is a vague concept, the knowledge that, somewhere, you have vast amounts of stored data – and you have no real idea what it is. Gartner coined the term to refer to data which organizations collect but fail to use or monetize, and eventually lose track of.

That data, which is stored in network file shares, collaboration tools (e.g., SharePoint), online storage services like Drive and Dropbox, old PCs, and backups, is dark because most people in the organization have no idea what’s in it. In fact, often that data is stored in legacy systems or placed on drives by people who have since left the organization. But, as organizations move to the cloud and must choose whether to leave data where it is or move it to an Azure Blob, it becomes more of an issue – not just for the potential of business value but for regulatory compliance.

Dark Data can include Private Data

Dark data offers no promises in terms of delivering business value. Yet, organizations cannot ignore it. Often, dark data contains everything from personally identifiable information to HR data, legal contracts, security, and access information, and other confidential or proprietary information. This presents real liabilities in information governance, especially in industries such as finance and public sector. And, for global companies, it becomes increasingly crucial that data analytics and governance be addressed simultaneously to meet data privacy laws across the EU and USA.

Knowing your enterprise data and being able to search for it would be the ideal. However, the absence of labels, categories and meta data in general makes it hard to choose what to send to AI for analysis and discovery, who receives access to what data, and what data to keep (and where to keep it). Most businesses have dark data specifically because it takes too much manual effort to sort and label. But dark data presents unknown potential and risks – without understanding its contents, no organization can optimize decisions around what to do best.

A Significant Governance Footprint

Both structured and unstructured data can be part of dark data. More unstructured than structured data resides in the dark.

Why? Unstructured data makes computerized processing more difficult, much of this data requires significant manual processing.  Azure cloud compute and storage use elasticity and scale to offer options to optimize resources efficiently and cost-efficiently process all data. This option is obviously not readily available in on-premise data centers. With SynerScope positioned on top of the customer’s Azure object store (Blob or ADLS), enterprises can quickly and economically see what content they have. More importantly they can use this information to take action.

For example, the underlying contracts and correspondences for 10-year-old invoices cannot be handled without proper governance. In the Azure cloud, you can generate that data. Yet, if there are multiple back-ends from different SaaS suppliers, moving dark data to the cloud is impaired from a governance and risk perspective. That’s why SynerScope’s SaaS-like application uses the storage on the customer’s Azure tenant. Therefore, all data protection and security is regulated by the single contract between the customer and Microsoft Azure. This simplicity allows the enterprise to confidently move data to the cloud, knowing that responsibilities and liabilities are clearly defined.

Categorizing Dark Data in the Azure Cloud

At Synerscope we deliver the tools to unlock dark data using machine learning for sorting by content, whilst your domain experts add context. Our AI sorts data visually, “stacking” content based on visual similarity – and highlighting keywords and descriptors pulled from the stack. Your domain expert can use that to add context to the stack – quickly identifying whether something is an invoice, a mortgage receipt, a single customer’s banking data, etc.

The software installs into your Azure tenant, leaving data in a system structure, only governed by your Azure contract. SynerScope runs similarly to an Azure module; we bring data to cache memory, it is computed, and newly generated metadata augments the original data. These data artefacts are moved into the storage, which you, as a client, set up and manage. We provide the support for you to:

  • Find relevant structured and unstructured data, open it for control, data governance, and maintainability for GDPR compliance
  • Find and structure data for governance to meet compliance requirements in finance, public sector, etc.
  • Improve triage for files to be inspected in KYC, CDD, PDD, and AML investigations

Most importantly, this applies both for stored dark data – and for the massive quantities of data churned out by CMS, self-service, surveys, and specifics like KYC programs and security. Synerscope delivers tooling to make the move to the cloud possible with dark data analysis – so that the organization implements proper governance on all data as it moves to the cloud – while creating structure and insight into new data.

Granular Insight into Big Data

Synerscope gives massive insight into not just dark data, but any data. By mapping data visually and relying on data experts to create connections, we speed up data analysis across nearly any type of data.

In a specific example, KYC is incredibly important for banks and other financial organizations. Automatic alert systems can have as much as a 5%+ false positive rate – each alert requires manual review. If each manual file review takes 4+ hours, a 5% false positive rate is a massive burden on the company. But Synerscope’s machine learning using AI to categorize and sort data, speeds up this manual review by as much as 20x.

As data continues to accumulate in the cloud, Synerscope’s role in making day-to-day compliance and governance decisions will grow. That applies for retrieving data, deciding where to store it, and whether to keep that data in the first place.

