Author: Stef van den Elzen
Nearly every company is collecting and storing large amounts of data. One of the main reasons for this is because data storage has become very cheap. However, storage may be cheap, the data also needs to be protected and managed which is often not done very well. Obviously, not protecting the data puts your company at a risk. More surprisingly, not managing the data brings an even higher risk. If the data is not carefully indexed and stored, it becomes invisible, underutilized, and eventually is lost in the dark. As a consequence the data cannot be used to the companies advantage to improve the business value. This is what is called dark data, “the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes.” — Gartner.
The potential of dark data is unimagined; performing active exploration and analytics enables companies to implement data-driven decision-making, strategy development, and unlock hidden business value. However, there are two main challenges companies are facing: discovery and analysis.
Not only is the dark data invisible, it is often stored in separate data silos; all isolated and separated per process, department, or application, and all are treated the same, despite the widespread variation in value. There is no overview of all data sources or how they are linked and related to each other. Also, because all silos are detached and data is stored for business purposes it lacks structure or metadata that hinders the determination of its original purpose. As a consequence there exists no navigation mechanism to effectively search, explore, and select this wealth of data for further analysis.
A large portion, roughly 80-90%, of this dark data is unstructured. So in contrast to numbers it consists of text, images, video, etc. Companies lack the infrastructure and tools to analyze this unstructured data. Business users are not able to directly ask questions to the data but need the help of data scientists. Furthermore, it is important not only to analyze one data source in isolation, as currently occurs with specialized applications, but to link multiple heterogeneous data sources (reports, sensor, geospatial, time-series, images, and numbers) in one unified framework for a better context understanding and multiple perspectives on the data.
The SynerScope solution helps companies overcome the challenges of discovery and analysis and simultaneously helps customers with infrastructure and architecture.
SynerScope serves as a data lake and provides a world map of the diverse and scattered data landscape. It shows all data sources, the linkage between them, similarity, data quality, and key statistics. Furthermore, it provides navigation mechanisms and full text search for effortless discovery of potential valuable data. In addition, this platform enables collaboration, data provenance, and makes it easy to augment data. Once interesting data is discovered and quality is assessed it is selected for analysis.
With SynerScope all types of data types such as numbers, text, images, network, geospatial and sensor-data can be analyzed all in one unified framework. Questions to the data can be answered instantly while they are formed using intuitive query and navigation interaction mechanisms. Our solution bridges the gap between data scientist and business users and engages a new class of business users to illuminate the dark data silos for a truly data-driven organization. At SynerScope we believe in data as a means, not an end.
Example SynerScope Marcato multi-coordinated visualization setup for rich heterogeneous data analysis; numbers, images, text, geospatial, dynamic network, all linked and interactive.