Wet open overheid: je informatiehuishouding op orde net als ‘je huis op orde’

Wet open overheid (Woo): van verplichting naar verbetering…

In het programma ‘Je huis op orde’ met presentator Viktor Brand kregen gezinnen de duidelijke opdracht om afscheid te nemen van een groot deel van hun spullen zodat ze met een opgeruimd huis verder konden. De spullen die ze selecteerden om weg te doen moesten ze in drie verschillende vakken plaatsen: Doneer, Verkoop of Recycle.

De digitalisering gecombineerd met een drang tot verzamelen heeft bij vele overheidsorganisaties een nagenoeg vergelijkbare chaos met digitale data veroorzaakt.

The story

De programmamakers halen alle spullen op en leggen deze gesorteerd op de vloer van een 2000 m2 grote loods. Dan komt het gezin binnen en direct is daar de verbazing. Eerst over de totale hoeveelheid waarna langzaam de herkenning van de voorwerpen in de verschillende vakken zich ontvouwt. Vevrolgens moeten ze zelf aan de slag, kiezen van welke spullen ze afscheid nemen en die dan verdelen over de vakken doneer, verkoop of recycle.

“Het programma brengt prachtig de verzameldrift van de mens in beeld en de onmogelijkheid om zonder overzicht te kunnen beginnen de chaos om te zetten naar structuur en orde”


( Met dank aan Talpa en sbs6)

De digitalisering gecombineerd met een drang tot verzamelen heeft bij vele overheidsorganisaties een nagenoeg vergelijkbare chaos met digitale data veroorzaakt. Steeds meer en steeds sneller genereren en verzamelen we data en met opslagkosten die bijna nihil groeit dat snel tot een onoverzichtelijke hoeveelheid. De verdeling naar verschillende afdelingen en naar systeemapplicaties resulteren in rigide data silo’s. Niemand kent het geheel en het is maar de vraag of de losse details goed gekend zijn.

Een vergelijk met alle verschillende kasten, kamers, zolders en garages van een huishouden dringt zich op. Het overzicht over de inhoud van de data ontbreekt en dus is een begin van orde scheppen moeilijk zo niet onmogelijk geworden.

Als we eerst al deze data gesorteerd en geordend in overzicht brengen was de helft van het werk voor ‘informatiehuishouding op orde’ al gedaan. Maar daarvoor missen we een team programmamakers. En de ambachtelijke aanvliegroute van inzet van meer mensen, al dan niet van buiten ingehuurd, om dossier voor dossier en pagina voor pagina te inventariseren en te markeren is op de huidige schaal van data en informatie praktisch noch economisch uitvoerbaar. Het zou ook grotendeels een terugkeer zijn naar een type aanpak uit de papieren wereld van vroeger.

SynerScope sorteert, categoriseert en toont patronen

Digitale technologie veroorzaakt het dataprobleem, maar ze levert ook steeds meer de mogelijkheid om een aanpak à la “Je huis op orde” te ontwikkelen en los te laten op alle digitale data. De grote loods vinden we in de schaalbare public could infrastructuur zoals MS Azure. Het aannemen van de data, sorteren en visueel uitstallen kan met software. SynerScope heeft een zeer krachtige versie in dit segment ontwikkeld die naast al (deels) gestructureerde data ook de ongestructureerde data meeneemt.  SynerScope sorteert, categoriseert en toont patronen in de data, waarmee de domein experts van de organisatie de beschikking krijgen over alle informatie en context om data markeringen en labels op detail niveau in de data aan te brengen. Maar dan niet pagina voor pagina maar met hele groepen van pagina’s, documenten of dossiers tegelijk zodat een grote snelheid bereikt kan worden zonder compromis op kwaliteit.

Wet open overheid (Woo)

Natuurlijk is de opgave in de data- en informatie huishouding van overheidsorganisaties complexer dan het recht-toe-recht-aan ‘ont-spullen’ in het tv-programma. Meerdere regels en wetten zijn op het omgaan met overheidsdata van toepassing. De archiefwet geeft aan welke data bewaard en wat vernietigd moet worden, en wanneer en voor hoe lang.  De Woo geeft aan welke data actief gepubliceerd moet worden en hoe dit de komende jaren in fasen over verschillende categorieën van overheidsdata en -informatie vormgegeven moet worden. Voor de oudere data blijft de mogelijkheid deze op Wob-wijze te blijven opvragen maar dan onder het regime van antwoordtijden van de Woo. Opgaven als de Omgevingswet en het zorgdomein vergroten in belangrijke mate de moeilijkheidsgraad om in control van alle data te komen. Over en door dit alles heen lopen de opgaven en verplichtingen volgend uit de AVG. Privacybescherming vraagt om maskering, openheid vraagt om deze selectief toe te passen, de beslissing wat te maskeren vraagt om goed overzicht, doorzicht en zicht op detail van de data.

Kortom voor alle beslissingen op ieder van de voornoemde gebieden en de beleidsbeslissingen geldt dat het kennen van de data altijd een vereiste is.

SynerScope labelt en ordent

Nadat de computer heeft gesorteerd en zijn patronen uit de data weergeeft gaan gebruikers met domeinkennis ieder gesorteerd data- ‘vak’ voorzien van labels die de inhoud markeren en zo ook de vlakken onderling naar inhoud differentiëren. Deze labels (ook wel tags of meta-data genoemd) zijn zeer waardevol voor hergebruik door de organisatie heen. Ook kan nieuwe onbekende data worden gemixt met dergelijke eerder gelabelde data zodat het mogelijk wordt eerder opgedane kennis direct over te zetten naar nieuw ingebrachte data.

Graag laten we u en uw organisatie ervaren wat een informatiehuishouding op orde brengen betekent als nieuwe methodes en processen ondersteund met SynerScope technologie worden ingezet. Er ligt een kans de Woo van verplichting om te buigen naar een krachtige impuls om uw organisatie beter met digitale assets te laten werken. Waarmee ze haar service aan de burgers en de maatschappij sterk kan verbeteren. De overheidsorganisatie die snel meer weet van en over haar data kan de openbaarmaking van data en informatie beter toesnijden op de behoeftes van de diverse stakeholders en achterbannen. Een verandering van informeren achteraf naar betrekken vooraf, transparant beleidsalternatieven voorleggen en de verschillende afwegingen open te leggen alles met een zo volledig mogelijke context van de onderliggende data.

Webinar

In ons aankomende webinar geven we u een beeld hoe dit er in de praktijk uit zal zien. Het organiseren van een pilot met data van uw eigen organisatie behoort uiteraard ook tot de mogelijkheden. Omdat een pilot met uw eigen data binnen enkele dagen gerealiseerd kan worden is testen efficiënter dan vergaderen over de mogelijkheden…

Schrijf u nu in voor ons webinar en vraag daarna een pilot aan.

 

Artificial Intelligence is Only as Good as Data Labeling

Data Labeling with SynerScope

Recent events in my home country inspired me to write this blog. Every day we hear stories about businesses and government organizations struggling to sufficiently understand individual files or cases. Knowledge gaps and lack of access to good information hurts individual-and-organizational well-being. Sometimes, the prosperity of society itself is affected. For example, with large-scale financial crime and remediation cases in banking, insurance, government, and pandemics.

We simply have little understanding of the data which means AI and analytics are set up to fail. In addition, it’s difficult to see what data we can or may collect to run human-computer processes of extracting relevant information to solve those issues.

Unlimited Data with No Application

The COVID19 pandemic shows not only how difficult it is to generate the right data, but also how difficult it is to use existing data. Therefore, data-driven decision-making often shows gaps in understanding data.

Banks spend billions on technology and people in KYC, AML, and customer remediation processes. Yet, they’re still not fully meeting desired regulatory goals.

Governments also show signs of having difficulties with data. For example, recent scandals in the Dutch tax office, such as the Toeslagenaffaire, show how difficult it is to handle tens of thousands of cases in need of remediation. And the Dutch Ministry of Economic Affairs is struggling to determine individual compensation in Groningen, where earthquakes caused by gas extraction have damaged homes.

Today, the world is digitized to an unbelievable extent. So, society, from citizens to the press to politicians and the legal system, overestimate the capabilities of organizations to get the right information from the data which is so plenty available.

After all, those organizations, their data scientists, IT teams, cloud vendors, and scholars have promised a world of well-being and benevolence based on data and AI. Yet, their failure to deliver on those promises is certainly not a sign that conspiracy theories are true. Rather, it shows the limits of AI in a world where organizations understand less than half of the data they have when it is not in a machine processing ready state. After all, if you don’t know what you have, you can’t tell what data you’re missing.

Half of All Data is Dark Data

Gartner coined the term “Dark Data” to refer to that half of all data that we know nothing about. And, if Dark Matter influences so much in our universe, could Dark Data not have a similar impact on our ability to extract information and knowledge from the data?

We have come to believe in the dream of AI too much, because what if dark data behaves as dark matter? By overestimating what is possible with data-driven decision making, people may believe that the powers that be are manipulating this data.

SynerScope’s driving concept is based on our technology to assess Dark Data within organizations. By better understanding our dark data, we can better understand our world, get better results from human and computer intelligence (AI) combined.

Algorithms Rely on Labeled Datasets

Today’s AI, DL (Deep Learning, and ML (Machine learning) need data to learn – and lots of it. Data bias is a real problem for that process. The better training data is, the better the model performs. So, the quality and quantity of training data has as much impact on the success of an AI project as the algorithms themselves.

Unfortunately, unstructured data and even some well-structured data, is not labeled in a way that makes it suitable as a training set for models. For example, sentiment analysis requires slang and sarcasm labels. Chatbots require entity extraction and careful syntactic analysis, not just raw language. An AI designed for autonomous driving requires street images labeled with pedestrians, cyclists, street signs, etc.

Great models require solid data as a strong foundation. But how do we label the data that could help us improve that foundation. For chatbots, for self-driving vehicles, and for the mechanisms behind customer remediation, fraud prevention, government support programs, pandemics, and accounting under IFRS?

Regulation and pandemics appear in the same sentence because, from a data perspective, they’re similar. They both represent a sudden or undetected arrival that requires us to extract new information from existing data. Extracting that new information is only manageable for AI if training data has been labeled with that goal in mind.

Let me explain with an easy example of self-driving vehicles. Today, training data is labelled for pedestrians, bicycles, cars, trucks, road signs, prams, etc. What if, tomorrow, we decide that the AI also must adapt to the higher speed of electric bikes? You will need a massive operation of collecting new data and re-training of that data, as the current models would be unlikely to perform well for this new demand.

Companies using software systems with pre-existing meta data models or business glossaries have the same boundaries. They work by selecting and applying labels without deriving any label from the content – otherwise they must label by hand, which is labor and time intensive – and often too much so to allow for doing this under the pressure of large-scale scandals and crises.

Automatic Data Labeling and SynerScope

The need to adapt data for sudden crises does not allow for manual labeling. Instead, automatic labeling is a better choice. But, as we know from failures by organizations and by government, AI alone is not accurate enough to take individual content into account.

For SynerScope, content itself should always drive descriptive labeling. Labeling methodology should always evolve with the content. That’s why we use a combination of algorithm automation and human supervision, to bring the best of both worlds together – for fast and efficient data labeling.

If you want to learn more about how our labelling works, feel free to contact us at info@synerscope.com

Handling Redress and Remediation

Redress and Remediation

No organization wants to move into a redress and remediation process. But, once you do, time is of the essence. Launching a redress investigation can happen suddenly. In other cases, it can involve slower planning. In either case, you suddenly have very different needs for organizational data compared to business-as-usual processes. In some cases, you might even need access to data that’s normally stored in the dark or in low-priority servers, which completely changes how your organization is able to access that data.

Redress and Remediation Processes are High Priority

If you’re facing the need to redress, remediate, or provide compensation, you likely have pressing reasons to do so. For example, your organization may be facing dwindling customer satisfaction, supplier de-listing, legal action, regulatory action, or damage to your organization’s reputation.

Redress and remediation processes bring individual case and file details to the forefront. Resolving those details is of high importance. However, without an immediate overview or a way to create high-quality comparisons of those individual or group cases quickly and efficiently, little can be done. For example, you first must manually review to see which cases require redress. And, deciding on what redress, remediation, or compensation should apply will remain difficult. Without those overviews, you could be providing too much or too little compensation.

Resolving this means making data a central part of the process. You must implement processes to direct redress and remediation actions. You also have to keep regulatory stakeholders informed enough so they do not escalate or start proceedings against you.

You Have to Act Fast, but Systems Aren’t Designed for Redress Processes

The default response to a redress & remediation process is to put people to work. Unfortunately, many of those people are called in ad-hoc, without the information and data they need to act upon.

Getting started means creating in-depth overviews of each case, with enough context from similar cases to guide decisions. Putting that into a control framework allows people to get started, while avoiding the risk of overcompensating individual cases or approving fraudulent claims.

Yet, making that shift of switching data management from everyday operations to a full investigation of minute data is not something that IT systems and support is normally designed for. Instead, you must combine data in new ways, to resolve individual cases quickly and fairly. That’s especially true when your cases demand bulk data access and processing, as remediation cases do. Remediation never starts out at a trickle of cases, you always need to address all of them, all at once. The level at which you can handle that bulk data will impact how much damage you can mitigate, how much work and rework is necessary, and how quickly you can finalize the project to the satisfaction of customers, internal and external stakeholders, and regulatory or legal stakeholders.

External Organizations Can’t Work Without Data

Large organizations often rely on third parties, whether specialized service providers, lawyers, consultants, or subject matter experts, to help manage these processes. Often, these include data and IT services as well. However, those consultants still need access to data, which your own IT systems must supply. Further, when you bring in consultants for IT design and implementations, their aim is to provide and build efficient solutions and applications – usually with the goal of running and supporting daily processes inside the organization.

Redress Demands Scaling Up Data-Handling Capabilities

Redress and remediation situations demand support in a very different way. You have to greatly enhance your capacity to manage data. Think of an airplane during an emergency landing. People don’t exit the plane in an orderly fashion using the stairs. Instead, they use emergency slideways, which greatly increase the capacity to empty the plane quickly whilst people arrive safely on the ground.

You cannot afford to lose time to prepare data or build up IT solutions for support during redress and remediation. Ad-hoc tools with query writing and spreadsheets often don’t help either. Instead, they can add to the confusion and make problems bigger, allowing individual cases to slip through the cracks.

If you need an immediate solution to remediation and redress processes, Synerscope is here to help. Our tooling installs quickly onto your Azure tenant, with data kept under your governance, so you can quickly sort, label, and review cases with the microscopic level of detail needed to ensure proper handling. And, with no changes in governance, you can implement the solution quickly and get your redress and remediation program running.

Customer Case: Stedin: MDM remediation

Using Dynamic Data Labelling to Drive Business Value

Dynamic Data Labelling with Ixivault

Before deriving any value from data, you need to find and retrieve relevant data. Search allows you to achieve that goal. However, for ‘search’ to work, we need two things: A search term needs to be defined by humans; data must be indexed for the computer to find it with cost and speed efficiency and to keep the user engaged. But search efficiency breaks under the sheer scale of all-available data and the presence of dark data (with no indexes or labels attached), when considering either finance or response time points of view.

Technologies like enterprise search never took off for this exact reason. Without labels, it’s ineffective to ask a system to select results from the data. At the very moment of creating the data the creator knows exactly what a file contains. But as time passes our memories fail, and other people might be tasked with finding and retrieving data long after we’ve moved on. Searching data in enterprise applications often means painstakingly looking up each subject or object recorded. For end-user applications like MS Office, we lack even that possibility. Without good labels search and retrieval options are near impossible.  And, while the people who create data know exactly what’s in it, the people who come after, and the programs we create to manage that data, cannot perform the same mental hat trick of pulling meaning from unsorted data.

At SynerScope we offer a solution easily recover data that was either lost over time or vaguely defined from the start.  We first lift such ‘unknown’ data into an automated, AI-based, sorting machine. Once sorted, we involve a human data specialist, who can then work with sub-groups of data rather than individual files. Again, unsupervised, our solution presents the user with the discerning words that represent each sub-group in relation to each other. In essence, the AI presents the prime label options for files and content in each subgroup, no matter what the size in number of files, pages, or paragraphs. The human reviewer only has to select and verify a label option, rather than taking on the heavy lifting task of generating labels.

Thus labeled, the data is ready for established processes for enterprise data. Cataloging, access management, analysis, AI, machine learning, and remediation are common end goals for data after Synerscope Ixivault generates metadata and labels.

SynerScope also allows for ongoing, dynamic relabeling of data as new needs appear. That’s important in this age of fast digital growth, with a constant barrage of new questions and digital needs. Ixivault’s analysis and information extraction capabilities can evolve and adapt to future requirements with ease, speed, and accuracy.

How Does Unlabeled Data Come about?

Data is constantly created and collected. When employees capture or create data, they are adding to files and logs. Humans are also very good at mentally categorizing data – we can navigate with ease through most recent data, unsorted and all. Whether that means navigating a stack of papers or nested folders – our associative brain can remember the general idea of what is in each pile of data – so long as that data doesn’t move. But we’re very limited by the scale we can handle. We have mental pictures of scholars and professors working in rooms where data is piled to the ceiling everywhere, but where little cleaning was ever allowed. This paradigm doesn’t hold for digital data in enterprises. Collaboration, analysis, AI needs and regulations always put too much pressure on knowing where data is.

Catalogs and classification solutions can help, but automation levels for filling process are too low. That leads to gaps and arrears in labeling data. The AI for fully automatic labeling isn’t there yet. Cataloging and classifying business documentation is even harder than classifying digital images and video footage.

Digital Twinning and Delivering Value with Data

Before broadband, there was no such thing as a digital twin for people, man-made objects, or natural objects. Only necessary information was stored in application-based data silos. By 2007, the arrival of the iPhone and its revolution in mobile and mobile devices changed that. Everyone and everything were online, all the time, and constantly generating data. The digital twin, a collection of data representing a real person or a natural or man-made object was born.

In most organizations, these digital twins remain mostly in the dark. Most organizations collect vast quantities of data on clients, customer cases, accounts, and projects. It stays in the dark because it’s compiled, stored, and used in silos. When the people who created the data retire or move to another company, its meaning and content fade quickly – because no one else knows what’s there or why. And, without proper labels your systems will have a hard time handling any of it.

GDPR, HIPPA, CCPA etc. forces organizations to understand what data they have regarding real people, and they demand the same for any historic data stored from the days before those regulations existed.

Regulations evolve, technologies evolve, markets evolve, and your business evolves, all driving highly dynamic changes to what you need to know from your data. If you want to keep up, ensuring that you can use that data to drive business value – while avoiding undue risks from business regulations, data privacy and security regulation – you must be able to search your data. Failing this, you could get caught in a chaotic remediation procedure, accompanied by unsorted data that doesn’t reduce the turmoil, but adds to the chaos.

Dynamic Data Labelling with Ixivault

Ixivault helps you to match data to new realities in a flexible, efficient way, with a dynamic, weakly-supervised system for data labeling. The application installs in your own secure Microsoft Azure client-tenant, using the very data stores you set up and control, so all data always remains securely under your governance. Our solution, and its data sorting power, helps your entire workforce – from LOB to IT –  to categorize, classify, and label data by content – essentially lifting it out of the dark.

Your data is then accessible for all your digital processes.  Ixivault shows situations and objects grouped by similarity of documentation and image recordings and allows you to compare groups for differences in the content.  This simplifies and speeds the tasks of assigning labels to the data. Any activity that requires comparison between cases, objects, situations, data, or a check against set standards is made simple.  Ixivault also improves the quality of data selection, which helps in a range of applications ranging from Know Your Customer and Customer Due Diligence to analytics and AI based predictions using historical data.

For example, insurance companies can use that data to find comparable cases, match them to risks and premium rates, and thereby identify outliers – allowing the company to act in pricing, underwriting or binding or all of them.

SynerScope’s type of dynamic labelling creates opportunities to match any data, fast and flexible. As perception and the cultural applications of data change over time, you can also match data with the evolving needs for information extraction, change labels as data contexts change, and to continue driving value from the data you have at your disposal.

If you want to know more about Ixivault or its dynamic matching capabilities in your organization, contact us for personalized information.

Moving to the Azure cloud: unpacking dark data

Moving to the Azure cloud?

Today, more and more businesses are moving to the cloud – to automate and take advantage of AI and scalable storage, and to reduce costs over existing legacy infrastructure. In fact, in 2021, an estimated 19.2% of large organizations made the move to the cloud. And Microsoft Azure is close to leading that shift – with a 60% market adoption.

Often organizations focus on selected applications during a cloud transition. However, existing data might actually present the bigger complexity.  A majority of organizations use less than 50% of the data they own. At the same time, there is no oversight of data that is owned. This unused, unclassified, and unlabeled data is otherwise known as “dark data”, because it remains in the shade until abundant time is allocated to sort, label, and classify it.

Moving to the Azure Cloud is Like Moving House

We believe there is merit to comparing moving to the Azure cloud and moving house. You decide where to move, you choose your new infrastructure, and you get everything ready to move in. Then, you pack up your old belongings and move it with you. The problem is you likely already have plenty of boxes lying around. Think about your attic, your basement, and storage. Things from earlier relocations. You might have lost all knowledge of what’s in there. The same holds true when your organization’s applications and data must move house. But this time you also have to deal with ‘boxes’ of data left unlabeled by people leaving the organization, data left unused for a longer time, and data left behind from already obsolete applications. Moving this and other less well-known data may create bigger issues in the future.

  • Data is accumulating faster than it ever did before. You’ll have more of it tomorrow. Therefore now is the best time to go through data and categorize it
  • Proper governance of data is impossible without knowing its contents first. Older data collected from before GDPR regulations is still there. Compliance and Risk officers and CISOs dread this unknown data and fear it may fall out of compliance regulations.
  • It can be difficult to pass regulatory compliance audits with dark data ar If you can’t open a ‘box’ of data to show auditors what’s inside, you can’t prove you’re compliant.
  • You’re also not allowed to simply delete data. Industries and governments must comply with laws and regulations on archiving and maintaining open data.
  • When you know what data you have you can strategize and move towards controlled decisions on cold/warm/hot storage to optimize both costs and access. Moving data that is still dark may bring about irreversible data loss or at least expensive repairs in the future
  • Locating and accessing data requires the kind of information best-captured in classifications and labels, historical data analysis needs this metadata.
  • The parts of data that make up dark data leaves organizations vulnerable as it makes designing and taking security precautions extra hard.
  • Sometimes you can or must delete information. However, you can only do so if you know its contents beforehand and can determine regulatory compliance and have the foresight for future valuable analytics.

How can you optimize accessing this data? When one of our clients, the Drents Overijsselse Delta Waterschappen, looked at archiving and storing its past project documentation in the cloud, it found the necessary manual labeling a daunting task. The massive time-investment needed is very similar for other organizations making a cloud transition. Manually reviewing data is simply too labor-intensive for most organizations to undertake within a feasible timeframe.

Unpacking Data with Synerscope’s Ixivault

With Synerscope, you can achieve the data clarity you need. As a weakly supervised AI system, our solutions are built to perform where standard AI approaches would fail. Synerscope’s Ixivault implements onto your Azure Tenant – with no backend of its own. This means that all data stays inside your tenant, which is a big plus for all matters and concerns regarding security, governance, and compliance. Our friction-less implementation then allows you to open up, categorize, and label dark data using a combination of machine learning with manual review to speed up the full process by an average of 70%.

Ixivault analyzes your full data pool of structured and unstructured data, creating categories based on data similarities, pulling keywords and distinctive terms, and generating images of those data stacks – which your domain expert can then sit down to quickly label. Most importantly, Ixivault has built-in learning capabilities, meaning that it gets better at categorizing and labeling your specific data as you use it.

All this makes Ixivault the perfect tool to help you move – by unpacking boxes of data as you move them to the cloud. You can then choose appropriate storage, governance and access controls, even if you need or don’t need to keep the data. For the first time you can have a near edge-to-edge overview of all your data with zoom in options to very granular levels so you can make the best choice what to do next with this newly discovered data. Having new information about your data can make you money and save you money all at the same time.

If you need help with unboxing your dark data as you move, contact us for more information about how Synerscope can help. You may also purchase the Ixivault app directly at Microsoft’s Azure Marketplace.

Ixivault Helps Labeling and Categorizing Dark Data in the Azure Cloud

Ixivault, a managed app on Microsoft Azure

Your organization’s dark data presents challenges when you move to the cloud. Yet, leaving it in a current location is also not the solution.

Dark data includes digital data which is stored but never mobilized for analysis or to deliver information. If you have dark data, your organization is already missing opportunities to derive value from it. However, if you don’t take dark data with you to the cloud, it drifts even further from your other data assets. Meanwhile, the flexible computation and memory infrastructure of the cloud offers a very cost-effective solution to mobilizing that data. Most importantly, it does so at any scale your organization needs.

However, there are still challenges here. For example, overcoming the risks of governance and compliance, increased storage costs, and storage tiering choices. Do you choose to store data in close proximity to synchronize with other data – but at a higher storage cost?

Migrating Dark Data to the Azure Cloud

For most organizations, failure to create and execute a dark data plan as part of the cloud transition is undesirable at best and breaching data compliance at worst. Synerscope delivers the tools to analyze and “unlock” that data during the transition, making efficient use of cloud computing, while keeping data in your full control. This means no additional risks arise for compliance, security, etc.

Synerscope also helps you mobilize dark data, using a combination of machine learning, AI, and human expertise. Unlocking dark data is essential for most organizations. That remains true whether you’re shifting from legacy systems to Azure, are reducing your governance footprint, or are pressed into unlocking data for compliance or a regulatory audit. Synerscope’s Ixivault comes into play at any point where you need detailed and broad overviews of complex data. This is achieved through sorting, categorizing, and revealing patterns and giving domain experts the tools to label categories at speed, with high accuracy.

Your Data, Your Azure Tenant

Ixivault is a managed app on Microsoft Azure. When you deploy the tool, it installs on top of your Azure Blob or ADLS where the data stays in your control. We power Ixivault on Azure computing, meaning that it dynamically scales up computing power to meet the size and complexity of the data you direct to it for scanning and computation. At no point does the data leave your Azure tenant or any assigned secured storage used before separating sensitive data out. SynerScope’s design suits the most stringent demands for compliance and governance. Our Ixivault feels and operates like a SaaS but does so in your tenant, without any proprietary back-end for storing your data assets. Therefore, Synerscope allows you to categorize, sort, and label your dark data without introducing additional regulatory complexities. Your data stays in your cloud, the process is fully transparent, and you control and monitor your tenant for all matters related to data sovereignty.

That applies whether you’re importing data to Azure for the first time to inspect before deciding where to store it or already have data in a Blob or ADLS and must inspect it or want to open data on legacy infrastructure.

Sorting and Categorizing Dark Data

Ixivault leverages AI and machine learning for sorting and text extraction. Here, visual displays offer domain experts rich and discerning context from which to choose the most suitable labels of descriptive metadata. Our technology is a weak supervised system, first unsupervised computing handles the data in bulk, followed by a human operator to validate labels and bulk sorted data categories. The system works on raw data inputs directly, without training. Using raw data sets with human validation to add labels means we can make the system smarter over time. Future raw data sets are automatically checked for similarities with previously processed data sets. So, high value can be achieved from day one, but the system learns over time. .

Ixivault abstracts data to hypervectors – comparing the similarity between data algorithmically. Using algorithms, the AI can accurately sort data into “Stacks” of similar files. Format, lay-out and content of documents are all used by the algorithms to separate common business documents e.g., contracts, letters, offers, invoices, emails, brochures, claims, and different tables. And our algorithms separate sub-groups according to actual content within each of these. Our language extraction presents distinctive groups of words from each “Stack”, allowing humans to select the most appropriate labels. The same extracted words can also be matched to business glossaries and data catalogs already available to your organization. Hypervectors allow our algorithms to detect similarities across documents ‘holistically’, at a scale beyond unaided human capacity. The resulting merge of rich ontologies and semantic knowledge are re-usable throughout the organization and the many applications it runs.

Machine Learning with Human Context

Ixivault creates outputs that allow your data experts to step in at maximum velocity and scale. The application displays a dashboard showing the stack of data, visual imaging of what’s in this stack, and keywords or tags pulled from that data and metadata. Where descriptive metadata is lacking or absent, our system presents new candidates for labels. The system supports users in running fast and powerful data discovery cycles, which link search, sorting, natural language programming, and labeling. The output is knowledge about your organization’s dark data which can be used and reused by other users and software systems.

This approach allows data experts to look at files and keywords and very quickly add tags. More importantly, it creates room for human expertise, to recognize when data is outside of the norm – e.g., files are related to a special circumstance, which machines simply cannot reliably do. The result is a powerful, fast and flexible system, usable with a variety of data.

Once you select the machine proposed labels, you only have to individually inspect a small number of the actual files to confirm the labeling for an entire group of sorted files.

Unlocking Dark Data as You Move to the Cloud

Moving to Azure forces most organizations to do something with, or certainly think about, their dark data. You can’t move untold amounts of data to the cloud without knowing what’s in it. You would not be able to extract enough additional value from such a blind move. Directing data to the right storage solutions for easy governance, compliance, and management demands knowledge of its content. E.g., so you can prioritize data for further processing and computation, or save on storage for less value-added content. Data intelligence can mostly be paid for by decreasing ‘dark storage’. Meanwhile, your organization can improve its governance footprint and ensure compliance.

Synerscope can deliver the potential value in dark data by increasing knowledge, helping with retention, access management, discovery, data cleansing efforts, data privacy protection measures, and compliance. Most importantly, dark data mining gives organizations the information needed to make business as well as IT and compliance decisions with that data – because Data intersects between the three.

To learn more about Synerscope’s software and our approach, contact us to schedule a demo and see the software in action.

Delving into Dark Data on Azure – Data Governance in the Cloud

For most organizations, dark data is a vague concept, the knowledge that, somewhere, you have vast amounts of stored data – and you have no real idea what it is. Gartner coined the term to refer to data which organizations collect but fail to use or monetize, and eventually lose track of.

That data, which is stored in network file shares, collaboration tools (e.g., SharePoint), online storage services like Drive and Dropbox, old PCs, and backups, is dark because most people in the organization have no idea what’s in it. In fact, often that data is stored in legacy systems or placed on drives by people who have since left the organization. But, as organizations move to the cloud and must choose whether to leave data where it is or move it to an Azure Blob, it becomes more of an issue – not just for the potential of business value but for regulatory compliance.

Dark Data can include Private Data

Dark data offers no promises in terms of delivering business value. Yet, organizations cannot ignore it. Often, dark data contains everything from personally identifiable information to HR data, legal contracts, security, and access information, and other confidential or proprietary information. This presents real liabilities in information governance, especially in industries such as finance and public sector. And, for global companies, it becomes increasingly crucial that data analytics and governance be addressed simultaneously to meet data privacy laws across the EU and USA.

Knowing your enterprise data and being able to search for it would be the ideal. However, the absence of labels, categories and meta data in general makes it hard to choose what to send to AI for analysis and discovery, who receives access to what data, and what data to keep (and where to keep it). Most businesses have dark data specifically because it takes too much manual effort to sort and label. But dark data presents unknown potential and risks – without understanding its contents, no organization can optimize decisions around what to do best.

A Significant Governance Footprint

Both structured and unstructured data can be part of dark data. More unstructured than structured data resides in the dark.

Why? Unstructured data makes computerized processing more difficult, much of this data requires significant manual processing.  Azure cloud compute and storage use elasticity and scale to offer options to optimize resources efficiently and cost-efficiently process all data. This option is obviously not readily available in on-premise data centers. With SynerScope positioned on top of the customer’s Azure object store (Blob or ADLS), enterprises can quickly and economically see what content they have. More importantly they can use this information to take action.

For example, the underlying contracts and correspondences for 10-year-old invoices cannot be handled without proper governance. In the Azure cloud, you can generate that data. Yet, if there are multiple back-ends from different SaaS suppliers, moving dark data to the cloud is impaired from a governance and risk perspective. That’s why SynerScope’s SaaS-like application uses the storage on the customer’s Azure tenant. Therefore, all data protection and security is regulated by the single contract between the customer and Microsoft Azure. This simplicity allows the enterprise to confidently move data to the cloud, knowing that responsibilities and liabilities are clearly defined.

Categorizing Dark Data in the Azure Cloud

At Synerscope we deliver the tools to unlock dark data using machine learning for sorting by content, whilst your domain experts add context. Our AI sorts data visually, “stacking” content based on visual similarity – and highlighting keywords and descriptors pulled from the stack. Your domain expert can use that to add context to the stack – quickly identifying whether something is an invoice, a mortgage receipt, a single customer’s banking data, etc.

The software installs into your Azure tenant, leaving data in a system structure, only governed by your Azure contract. SynerScope runs similarly to an Azure module; we bring data to cache memory, it is computed, and newly generated metadata augments the original data. These data artefacts are moved into the storage, which you, as a client, set up and manage. We provide the support for you to:

  • Find relevant structured and unstructured data, open it for control, data governance, and maintainability for GDPR compliance
  • Find and structure data for governance to meet compliance requirements in finance, public sector, etc.
  • Improve triage for files to be inspected in KYC, CDD, PDD, and AML investigations

Most importantly, this applies both for stored dark data – and for the massive quantities of data churned out by CMS, self-service, surveys, and specifics like KYC programs and security. Synerscope delivers tooling to make the move to the cloud possible with dark data analysis – so that the organization implements proper governance on all data as it moves to the cloud – while creating structure and insight into new data.

Granular Insight into Big Data

Synerscope gives massive insight into not just dark data, but any data. By mapping data visually and relying on data experts to create connections, we speed up data analysis across nearly any type of data.

In a specific example, KYC is incredibly important for banks and other financial organizations. Automatic alert systems can have as much as a 5%+ false positive rate – each alert requires manual review. If each manual file review takes 4+ hours, a 5% false positive rate is a massive burden on the company. But Synerscope’s machine learning using AI to categorize and sort data, speeds up this manual review by as much as 20x.

As data continues to accumulate in the cloud, Synerscope’s role in making day-to-day compliance and governance decisions will grow. That applies for retrieving data, deciding where to store it, and whether to keep that data in the first place.

If you would like to see how it works, contact us for a demo or pilot

Is Your Organization Prepared to Manage Dark Data?

The Business Value of Mining Dark Data in Azure Infrastructure

As organizations accelerate the pace of digital transformations, most are moving to the cloud. In 2019, 91% of organizations had at least one cloud service. But, 98% of organizations still maintain on-premises servers, often on legacy infrastructure and systems. At the same time, moving to the cloud is a given for organizations wanting to take advantage of new tools, dashboards, and data management. The global pandemic has created a prime opportunity for many to make that shift. That also means shifting data from old infrastructure to new. For most, it means analyzing, processing, and dealing with massive quantities of “Dark data”.

Most importantly, that dark data is considerable. In 2019, Satya Nadella discussed Microsoft’s shift towards a new, future-friendly Microsoft Azure. In it, he explained that 90% of all data had been created in the last 2 years.  Yet, more than 73% of total data had not yet been analyzed. This includes data collected from customers as well as that generated by knowledge workers with EUC (End-user computing, such as MSFT Office, email, and a host of other applications. As a result, the process of big data creation has only accelerated and (unfortunately) more dark data exists now than ever before.

As organizations make the shift to the cloud, move away from legacy infrastructure and towards microservices with Azure, now is the time to unpack dark data.

Satya Nadella discusses Microsoft’s shift towards a new, future-friendly Microsoft Azure

Dealing with (Dark) Data

The specter of dark data has haunted large organizations for more than a decade. The simple fact of having websites, self-service, online tooling, and digital logs means data accumulates. Whether that’s automatically collected from analytics and programs, stored by employees who then leave the company, or part of valuable business assets that are tucked away as they are replaced – dark data exists. Most companies have no real way of knowing what they have, whether it’s valuable, or even whether they’re legally allowed to delete it. Retaining dark data is primarily about compliance. Yet, storing data for compliance-only purposes means incurring expenses and risks without deriving any real value. And simply shifting dark data to cloud storage means incurring huge costs for the future organization – when dark data will have grown to even more unmanageable proportions.

Driving Value with Dark Data

Dark data is expensive, difficult to store, and difficult to migrate as you move from on-premises to cloud-hosted infrastructure. But it doesn’t have to be that way. If you know what data you have, you can set it into scope, delete data you no longer need, and properly manage what you do need. While you’ll never use dark data on a daily, weekly, or even monthly basis – it can drive considerable value, while preventing regulatory issues that might arise if you fail to unlock that data.

  • Large-scale asset replacement can result in requiring decades-old data stored on legacy systems.
  • GDPR and other regulations may require showing total data assets, which means unlocking dark data to pass compliance requirements
  • Performing proper trend analysis means utilizing the full extent of past data alongside present data and future predictions.

Dark Data is a Business Problem

As your organization shifts to the cloud, it can be tempting to leave the “problem” of dark data to IT staff. Here, the choice will often be to discard or shift it to storage without analysis. But dark data is not an IT problem (although IT should have a stake in determining storage and risk management). Instead, dark data represents real business opportunities, risks, and regulatory compliance. It influences trend and performance analysis, it influences business operations, and it can represent significant value.

For example, when Stedin, a Dutch utility company serving more than 2 million homes, was obligated to install 2 million smart meters within 36 months, they turned to dark data. Their existing system, which utilized current asset records in an ERP was only enabling 85% accuracy on “first time right” quotes for engineer visits. The result was millions in avoidable resource costs and significant customer dissatisfaction. With Synerscope’s help, Stedin was able to analyze historical data from 11 different sources – creating a complete picture of resources and creating a situational dashboard covering more than 70% of target clients. The result was an increase to a 99.8% first time right quote – saving millions and helping Stedin to complete the project within deadline.

Synerscope delivers the tools and expertise to assess, archive, and tag archived data – transforming dark data from across siloed back-ends and applications into manageable and useable assets in the Azure cloud. This, in turn, gives business managers the tools to decide which data is relevant and valuable, which can be discarded, and which must be retained for compliance purposes.

If you’d like to know more, feel free to contact us to start a discussion around your dark data.

 

Ixivault™ – Complete View & Control of Your Data in the Cloud

Ixivault™ – software to get on top of all your enterprise data

The cloud offers compute, memory and storage horsepower and efficient cost to get every bit of information extracted from data. But it needs new software instead of a lift&shift of traditional analytical software to really obtain the benefits. And then there is the question what data to transfer to the cloud? Data silos, structured and unstructured data, Dark Data all stand in the way of an easy transfer. GDPR exacerbates this as it introduces three main challenges that Compliance, Risk and Data Protection functions must deal with, and on which they base their advice to the LOBs and Executive leadership:

  • Does your cloud computing connect with the sensitivity of the data you entrusted or want to entrust to the cloud? If a cloud computing solution is chosen where data processing and/or storing are shared between enterprise customers, the risk of data leakage is present.
  • The question which law applies: again the choice of software or software platform determines if sovereignty principles about the physical location at which certain sensitive data is held can be met.
  • The externalization of privacy requires that no cracks exist between the contracts an enterprise makes with software or platform vendors, the services companies, and the cloud vendor.

These are real hurdles in taking maximum advantage of the speed, scalability, flexibility and cost efficiency of the cloud. Much of this potential is lost when an enterprise at best feels safe transferring only parts of their data. The holes in this ‘Emmental’ cheese of data gets even bigger when we realize that most enterprises have near 70% of Dark Data. (Satya Nadella at Microsoft Inspire 2019)

SynerScope Ixivault™

We propose to scan all content of the enterprise data and using the cloud to perform that in a safe way. For this purpose, SynerScope introduces its Ixivault™ on Microsoft Azure. The setup is entirely within the enterprise’s own Azure tenant through Azure marketplace. The loading to the cloud and bulk scanning happens there (called a vault). The unknown dark data is transformed into known data and silo-ed data is linked so that a grounded decision on its release for further use can be made. Data that cannot be released for wider use in the enterprise cloud tenant is deleted from the vault. Data that’s safe is published for BI, data science and domain expert’s use by different functional departments in the company. All the original data sets stay safely in the company’s data center.

SynerScope turns Dark Data into Bright Data, ready to be used by human combined with machine intelligence for extracting information and value. Our three main solutions Ixivault™, Ixiwa™ and Iximeer™ are designed to handle any type of data in any combination, fast and flexible. Unstructured text, image and IoT data can easily be linked with structured data from ERP, CRM and other operational systems. To facilitate widespread secure and safe working with the data we support the (self) publishing of data to analytic processes (human-machine) under full linkage to the Data Protection Impact Assessment (DPIA) requirements. Granular content-based access control, integration of masking and hashing functions ensure that no eyes will see, and no process will use any un-eligible data (NoXi principle).

To further service the compliance, risk and data protection functions of the company, our systems will log every touch point of the data. Humans and machines can be audited for having worked inside the boundaries of Standard Operating Procedures (SOPs) set with each Project-DPIA. Providing evidence for an always-appropriate use of data is efficiently supported in SynerScope.

Azure

We have built our solution specifically to operate in your own Azure cloud-tenant environment. As an enterprise you directly agree with Microsoft Azure on the SLA’s with regards to data security. Adding SynerScope doesn’t affect these SLA’s and publishing of data inside your enterprise is linked to your own Azure AD set up. SynerScope prefers an Azure set up where all data lives in the ADLS/Blob store (and in its original application source system). We believe data should remain open for use in different applications.

SynerScope is proud to be a partner of Microsoft. We continuously look to exploit the expanding functions of Azure modules. Our serverless architecture provides the flexibility to deploy in any enterprise’s Azure tenant; we gladly welcome you to discuss opportunities to take advantage of your own tenant architecture.

 

IxivaultTM Azure

SynerScope Solution

Flexibility and speed of the SynerScope solution is secured by our patented data scanning, matching and visualization technology.
All this we have developed in aid of your move towards a data driven architecture in the cloud, including full compliance and with a view to provide you with significant cost reduction.
The solution is tried and tested to full satisfaction by our clients in the financial & insurance industry & the critical infrastructure industry.

“SynerScope’s platform might best be described as a data lake management product that covers everything from automated ingestion, through discovery and cataloguing to data preparation.” (BloorInDetail – 2018)

SynerScope will bring you:

Cost Reduction

  • Data warehouse optimization / application rationalization
  • Optimized use of data, cloud data storage and your cloud data warehouse architecture
  • SynerScope as migration and data publishing tool
  • Broad use of data in the LOB by domain experts, citizen data scientists or data scientists proper.

Efficiency improvement IT-operations

  • Incident/ problem root cause analysis reducing time to repair
  • Reduction of incident and problems
  • Usage of historical data and archives

The SynerScope Solution:

  • Extracting and accelerating insight from data with patented products
  • Open technology on Microsoft Azure
  • Deployment via Microsoft marketplace
  • Stand alone deployment also possible
  • Is complementary to third party data cataloguing tools, and adds considerably to the ease of use of unstructured data
  • Security and compliance proof (traceable & transparent)
  • Fast deployment

SynerScope References:

  • Financial & Insurance industry
  • Critical Infrastructure
  • Government (safety regions and smart city)

Real-time Insight in All Data, Present and Past

The promise of data-driven working is great: risk-based inspection, finding new revenue models, reducing costs and delivering better products and services. Every company wants this, but it often fails.
The most important business data cannot be fully unlocked by traditional analysis tools. Why does it go wrong and how do you manage to convert all data into insight?

The more you know, the more efficient and better you can make your products and services. That is why data-driven working is high on the agenda of many organizations. But an Artificial Intelligence or BI tools deliver, only partially or not at all, on the promise of data-driven work. That’s because they can only analyze part of the entire mountain of data. And what is an analysis worth if you can only examine half or a quarter of the data you have.

Insights are hidden in unstructured data

Many organizations started measuring processes in ERP and CRM systems over the past 20 years. They store financial data, machine data and all kinds of sensor data. These measurement data are easy to analyze, but do not tell where things go wrong during the entire operation.
This so-called structured data provides only partial insight, while you look for answers in the analysis of all data. It is estimated that 80% to 90% of all data from organizations is unstructured: we are talking about uncategorized data that stored in systems, notes, e-mails, handwritten notes on work-drawings and all kinds of documents across the organization. This valuable resource remains unexplored.

The unexplored gold mine of unstructured data

Organizations have terabytes of it: project information, notes, invoices, tenders, photos and films that together can yield an enormous amount of insights. This fast-growing gold mine is more of a data maze. Over the years, digitization took place step-by-step, process-by-process and department-by-department. During this digitization in slow motion, no one thought that it was useful to coordinate all information in such a way that you can easily analyze it later.

Artificial Intelligence and BI tools get lost

Departments of factories, offices or government agencies created their own data world through this so called ‘island automation’. Separate silos of application data, process data such as spreadsheets, presentations, invoices, tenders, and texts in all kinds of file formats. Moreover, departments and people all categorize information differently, and not structured like a computer would. Not everyone administers equally neatly, or categories are missing, so that colleagues simply write a lot of data away in the “other” field. The problem is that BI and AI tools cannot properly look into this essential and unstructured information. They lack signage, so they get lost in the maze of unstructured data.

Turning archives into accessible knowledge (and skills)

For many companies the future lies in the past. Because most organizations have boxes full of archived material from the pre-digital era, they are now digitizing at a rapid pace. Decades of acquired knowledge and experience are stored but hidden in these archives. Because, like many digital files, these are not well structured. Who categorized their project notes or files neatly into different categories, if they were available at all? If you want to use this unstructured data now, it will take you hundreds of hours of manual work to analyze. SynerScope’s technology searches terabytes or petabytes of data within 72 hours and provides immediate answers from all data.

Unstructured data harbor new revenue models

How that works? A non-life insurer did not know exactly where 25% of their insurance payments went. That is why SynerScope automatically examined the raw texts of millions of damage claims of the last 20 years. The word broken screen came up immediately for claims above 100 euros. The graph showed that screen breakage was rare until 2010, but then grew explosively. What happened? The insurer had never made the category of smartphones or tablets. As a result, they missed a major cost item, or to put it positively: for years they overlooked a new revenue model.


Turn data into progress

Thanks to the power of cloud computing in Azure, SynerScope is able to analyze large amounts of data in real time. And it doesn’t matter what kind of data it is. Spreadsheets, meeting minutes, drone images, filing cabinets full of invoices, you name it! Do you have hundreds of terabytes or even petabytes of satellite or drone data? Then it will be in the model tomorrow! Thanks to the analysis of the present and the past, organizations with SynerScope’s software live up to the promise of data-driven working. Leading companies such as Achmea, ExxonMobil, Stedin, VIVAT and De Volksbank are converting their data into progress with the solution from SynerScope.
Do you also want insight into your present & past to get a grip on the future?
Then request a demo!