Using Smart City Data for Catastrophe Management: It Still Is All About the Data …

Author: Monique Hessling

Recently I had the pleasure to work with SMA’s Mark Breading on a study about the development of smart cities and what this means for the insurance industry (Breading M, 2017. Smart cities and Insurance. Exploring the Implications. Strategy Meets Action).

Smart Cities and InsuranceIn this study Mark touched on a number of very relevant smart city related technical developments that impact the insurance industry, such as driverless cars, smart buildings, improved traffic management, energy reduction, or sensor-driven better controlled and monitored health and well-being.  Mark also explored how existing risks and how carriers assess them might change, and how these might be reduced due to new technologies.  However, new risks, especially around liability (think cyber, or who is to blame for errors made by technology in a driverless car or in automated traffic management) will evolve. Mark concluded that insurers will have to be ready to address these changes in their products, risk assessments and risk selection.

Here in the USA, the last weeks have shown again how much impact weather can have on our cities, lives, communities, businesses and assets. We all have seen the devastation in Texas, Florida, the Caribbean and other areas and felt the need to help.  As quickly as possible. Insurance carriers and their teams too go out of their way (often literary) to assist their clients in these overwhelmingly trying times. And I learned in working with some of them that insurers and their clients can benefit from “smart city technologies’ also in times of massive losses.

I have seen SynerScope and other technology being used to monitor wind and water levels, overlaying this data with insured risks and exposure data. Augmented with drone and satellite pictures and/or smart building and energy grid sensor data (part of this is often publicly available and open), this information gives a very quick first assessment of damage (property and some business interruption) to specific locations. I have seen satellite and drone pictures of exposures being machine analyzed, augmented with other data and deployed in an artificial intelligence/machine learning environment  that by using similarity analyses quickly identifies other insured exposures that most likely have incurred similar damages. This enables adjustors to  proactively get involved in addressing this potential claim, hopefully limiting damages and getting the insured back to normal as soon as possible. Another use of this application of course is fraud detection.

Smart City Data

Smart city projects  use technology to make daily life better for citizens, business and government. The (big) data these projects generate however can also be very helpful in dealing with a catastrophe and its aftermath. We don’t always think creatively about re-using our data for new purposes. Between carriers, governments and technology providers we should explore this more. To make our cities even smarter, also in bad times.

Download the report:

Smart Cities and Insurance


Data Lakes, Buckets and Coffee Cups

Author: Monique Hessling

Over the last years, primarily large carriers and especially the more “cutting edge” ones (for all the doubters: yes there is such a thing as a cutting edge insurer), have invested in building data lakes. The promise was that these lakes would enable them to more easily use and analyze “big data”, and gain insights that would change the way we all do business. Change our business for the better, of course. More efficient, better customer experiences, better products, lower costs. In my conversations with all kinds of carriers, I have learned that I am not the only one who struggles to totally grasp this concept:

A midsize carrier’s CIO in Europe informed me that his company was probably too small for a whole lake, and asked me if he could start with just a “data bucket”. His assumption was that many buckets ultimately would construe a lake. Another carrier’s CIO explained to me that she is the proud owner of a significant lake. It is just running pretty dry since she analyzes, categorizes and normalizes all data before dumping it in. She explained that she was filling a big lake with coffee cups full of data. It would take her a long time to get that lake filled..

You might notice that these comments all dealt with the plumbing of a big data infrastructure; the carriers did not touch on analytics and valuable insights yet. Let alone on operationalizing insights or measurable business value. Many carriers seem to be struggling with the classical pain-point of ETL, also in this new world.

By digging into this issue with big data SMEs , learned that this ETL issue is more a matter of perception than a technological problem. Data does not have to be analyzed and normalized before being dumped into lakes. And it can still be used for analytical exercises. Hadoop companies such as Hortonworks, Cloudera or MapR, or integrated cloud solutions such as the recently announced Deloitte/SAP HANA/AWS solution provide at least part of the solution to dive and snorkel in a lake without restricting oneself to tipping a toe in a bucket of very clean and much analyzed data.

And specialized firms such as SynerScope can prevent weeks, months or even longer of filling that lake with coffee cups full of clean data by providing capabilities to fill lakes with many different types of data fast (often within days) and at a low cost. Adding their capabilities in specialized deep machine learning to these big data initiatives allows for secure, traceable and access controlled use of “messy data” and creates quick business value.

Now, for all of us data geeks, it feels very uncomfortable to work with, or enable others to work with data that has not been vetted at all. But we’ll have to accept that with the influx of the massive amounts of disparate data sources carriers want to use, it will become more and more cost and time prohibitive to check, validate and control every piece of data being used by our businesses at point of intake into the lake. Isn’t it much smarter to take a close look at data at the point where we actually use it? Shifting our thinking that way, coupled with technology available, will enable much faster value out of our big data initiatives. I appreciate that this creates a huge shift in how most of us have learned to deal with data management. However, sometimes our historical truths need to be thrown overboard and into the lake before we can sail to a brighter future.

Dataworks Summit Munich and Dreams Coming True

Author: Monique Hesseling

Last week the SynerScope team attended the Dataworks Summit in Munich: “the industry’s premier big data community event”. It was a successful and well-attended event. Attendees were passionate about big data and its applicability to different industries. The more technical people learned (or in the case of our CTO and CEO: demonstrated) how to get most value quickly out of data lakes. Business folks were more interested in sessions and demonstrations on how to get actionable insights out of big data, use cases and KPIs. Most attendees came from the EMEA region, although I regularly detected American accents also.

It has been a couple of years since I last attended a Hadoop/big data event -I believe it was 2013- and it was interesting last week to see the field maturing. Only a few years ago, solution providers and sessions focused primarily on educating attendees on the specifics of Hadoop, data lakes, definitions of big data and theoretical use cases: “wouldn’t it be nice if we could..”. Those days are gone. Already in 2015, Betsy Burton from Gartner discussed in her report “Hype Cycle for Emerging Technologies ”  that big data quickly had moved through  the hype cycle and had become a megatrend, touching on many technologies and ways of automation. This became obvious in this year’s Dataworks Summit. Technical folks questioned how to quickly give their business counterparts access and control over big data driven analytics. Access control, data privacy and multi-tenancy were key topics in many conversations. Cloud versus local still came up, although the consensus seemed to be that cloud is becoming unavoidable, with some companies and industries adopting faster than others. Business people inquired about use cases and implementation successes. Many questions dealt with text analysis, although a fair number of people wanted to discuss voice analysis capabilities and options, especially for call center processes. SynerScope’s AI/machine learning case study of machine-aided tagging and identifying pictures of museum artifacts also got a lot of interest. Most business people however had a difficult time coming up with business cases in their own organizations benefitting from this capability.

This leads me to an observation that was made in some general sessions also: IT and technical people tend to see Hadoop/data lake/big data initiatives as a holistic undertaking, creating opportunities for all sorts of use cases in the enterprise. Business people tend to run their innovation by narrowly defined business cases, which forces them to limit the scope to a specific use case. This makes it difficult to justify and get funding for big data initiatives beyond pilot phases. We probably all would benefit if both business and IT would consider big data initiatives holistically at the enterprise level.  As was so eloquently stated in Thursday’s general session panel: “Be brave! The business needs to think bigger. Big Data addresses big issues. Find your dream projects”!  I thought it was a great message, and it must be rewarding for everybody working in the field that we can start helping people with their dream projects. I know that at SynerScope we get energized by listening to our clients’ wishes and dreams and making these into realities. There still is a lot of work to be done to fully mature big data and big insights, and make dreams come true, but we all came a long way since 2013.  I am sure the next step on this journey to maturity will be equally exciting and rewarding.

Artificial Intelligence ready for “aincient” cultures?

Author: Annelieke Nagel


Google,, Synerscope and the Dutch National Museum of Antiquities are creating a revolutionary acceleration in antiquities research

Last Monday I was present at the launch of a fantastic initiative for Egyptian art lovers around the world! A more apt setting was not possible as the presentations were organized in front of the Temple of Taffeh, an ancient Egyptian temple built by order of the Roman emperor Augustus.

Egyptologist Heleen Wilbrink, founder of, Andre Hoekzema, Google country manager Benelux and Jan-Kees Buenen, CEO SynerScope were the presenters that afternoon. is the driving force behind this pilot project. Thus all presentations were geared towards explaining the need for protection of the world heritage through digitally capturing the art treasures and even more importantly, being able to research them and accelerate discoveries by merging all data sources.  To secure the progress of this kind of research, it also depends on support of outside funds. (If you are interested, please go to for further information)

The current online collection of the Dutch National Museum of Antiquities (Rijksmuseum van Oudheden (RMO)), consists of around 57,000 items and can now be searched within hours, in a way previously not possible, thanks to SynerScope’s powerful software built on top of Google Cloud Vision API.

The more in-depth technical explanation of the software and partnerships involved, was compelling as it linked Artificial Intelligence and deep learning together with artifacts and an open mind, in order to make this project possible.

This unique pilot program needed to unlock all data available (text, graphs, photos/video, geo, numbers, audio, IoT, biomed, sensors, social) easy and very fast!

The large group of objects (60,000 in this instance but the RMO has another 110,000 more to do) from various siloed databases was categorized and brought together into SynerScope’s data visualisation software: images and texts simultaneously available, linked to a time and location indicator. The system indicates the metadata and descriptions certain items have in common, and the similarities in appearances.

As CEO Jan-Kees Buenen put it: “At SynerScope, we offer quick solutions to develop difficult-to-link data and databases, making them comprehensible and usable”.

Through the RMO online collection can be linked to external databases from other museums around the world. Thus it generated a lot of interest from museums like Teylers Museum Haarlem, Stedelijk Museum Amsterdam and Foundation Digital Heritage (Stichting DEN). They were all present to absorb the state-of-the art information that was presented. Interestingly enough some Egyptologists present expressed their slight scepticism to embrace this new technology to unlock the ancient culture.

We will soon notice that the outcome of the researched data will be used as a source of inspiration for new exposition topics, and I am sure it will also progressively serve the worldwide research community.

I believe this latest technology is the future of the past!

Innovation in action: Horses, doghouses and winter time…

Author: Monique Hesselink

During a recent long flight from Europe, I read up on my insurance trade publications. And although I now know an awful lot more about block chain, data security, cloud, big data and IoT than when I boarded in Frankfurt, I felt unsatisfied by my readings (for the frequent flyers; yes, the airline food might have had something to do with that feeling). I missed real live case studies, examples of all this new technology in action in normal insurance processes, or integration into down-to-earth daily insurer practices. Maybe not always very disruptive, but at least pragmatic and immediately adding value.I know the examples I was looking for are out there, so I got together with a couple of insurance and technology friends and we had a great time identifying and discussing them. For example, the SynerScope team in the Netherlands told me that their exploratory analysis on unstructured data (handwritten notes in claims files, pictures) demonstrated  that an unexplained uptick in home owners claims was caused by events involving horses. Now think about this for a moment: in the classical way of analyzing loss causes we start with a hypothesis and then either verify or falsify that. Honestly, even in my homeland I do not think that any data analyst or actuary would create a hypothesis that horses would be responsible for an uptick in home owners losses. And obviously “damage caused by horse” is not a loss category on the structured claims input either, under home owners coverage. So up to not too long ago, this loss cause either would not have been recognized as significant, or it would have taken analysts enormous amount of time and a lot of luck identifying it by sifting through mass amounts of unstructured data. The SynerScope team figured it out with one person in a couple of days. Machine augmented learning can create very practical insights.

In our talks, we discovered these type of examples all over the world; here in the USA, a former regional executive at a large carrier told me that she found an uptick in house fires in the winter in the South. One would assume that people mistakenly set their house on fire in the winter with fireplaces, electrical heaters etc to stay warm. Although that is true, a significant part of the house fires in rural areas was caused by people putting heating lamps in dog houses: to keep Fido warm. Bad idea.. Again; there was no loss code for “heating lamp in doghouse” in structured claims reporting processes, nor was it a hypothesis that analysts thought to pose. So it took the trending of  loss data over years before the carrier noticed this risk and took action to prevent and mitigate these dreadful losses. Exploratory analysis on unstructured claims file information in a deep machine learning environment, augmented with domain expertise and a human eye -as in the horse example I mentioned earlier- would have identified this risk much faster. We went on and on about case studies like those..

Now, although I am a great believer and firm supporter of healthy disruption in our industry, I think we can support innovation by assisting our carriers with these kind of very practical use cases and value propositions. We might want to focus on practical applications that can be supported by business cases, augmented with some less business case driven innovation and experimenting. I firmly believe that a true partnership between carriers, instech firms and distribution channels and a focus on innovation around real-life use cases will allow for fast incremental innovation and will keep everybody enthusiastic about the opportunities of the many new and exciting technologies available. While doing what we are meant to do; protecting homes, horses and human lives.

First Time Right

First time right: sending the right qualified engineer to the address of installation.

To ensure a continuous and reliable power supply to households in the coming years, energy providers are replacing old meters with new smart meters. The old, traditional meter is not prepared for the smart future and not suitable for new services and applications that help reduce energy consumption.

A wide range of meters of different ages are currently used in houses across the country. Some of these are too old or too dangerous which means only engineers holding special certificates can do the exchange to the new smart meters.  Currently it is guess-work what type of meter is to be replaced upon arrival at the address of installation. And so it happens too often that an engineer has to leave empty-handed as he is unable to carry out the planned job. This means the resident must be present to open the door twice, which is very inconvenient.  The big question is: how to send the right qualified engineer first time round?

For inventory reasons, energy companies started to ask their engineers to take photographs of the meters they repaired or exchanged during these last years. Over the years pictures of meter boxes in all shapes and sizes were gathered. SynerScope is able to take these pictures and add relevant data available from open sources like information on location of homes, date of construction and pictures of neighborhoods. This way SynerScope creates profiles of where a certain type of meter box can be found. As not all meter boxes are documented it is now possible, based on these created profiles, to make the right prediction about the type of meter in a certain home that needs to be replaced. Thus sending the right engineer first time round, leading to happy faces for both the resident and the energy company.

The Panama Papers: advances in technology leave nowhere to hide

Having identified international politicians, business leaders and celebrities involved in webs of suspicious financial transactions, the International Consortium of Investigative Journalists (ICIJ) is now being asked by tax authorities to provide access to the 11 million leaked documents it has been handling over the past year. Meanwhile conspiracy theories are running wild over the source of the leak, which insisted on communicating using only encrypted channels.

Data leaks are becoming more common but also getting larger. The Panama Papers leak contained 11.5 million documents that were created between the 1970s and late-2015 by Mossack Fonseca. The 2.6 terabytes of data is equivalent to 200 high-definition 1080p movie files and far larger than the Edward Snowden trove.

Mixing different sources of data

The Panama Papers show a world of tax evasion and tax dodging. The actors achieve their goals by establishing networks of off-shore companies, some of which completely hide the ultimate beneficiary ownership.

Untangling such networks with conclusive proof on individual entities requires a clear view on activities in rich context. Technology helped the ICIJ with digging through vast amounts of digital data, but it still was a slow and tedious process.

The latest technology from SynerScope demonstrates how to deliver speed at scale for such tasks. Its ability to fast linking of disparate data results in ultra-rich context at speed.

Fast delivery of rich context

In record time SynerScope was able to reveal from the Panama Papers1 all entities; people and companies, and their various relationships and locations involved. The relationships and entities are expanded with the unstructured data of original text and image files.

We show this in the screenshot below by quickly adding-in original patent documentation for those owners whose name also appeared in the Panama Papers.

SynerScope presents the mixture of data in a single pane of glass, where each tile interacts with the other:

Tile 1: The original document from the Panama Papers.

Tile 2: Helps to determine the topics specific to your selected (orange) sub-network.

Tile 3: Shows the network in detail.

Tile 4: The original USPTO patent document.

Tile 5: Shows the location of selected (orange) versus non-selected (grey) entities in the network.

Tile 6: Shows the selected sub-network (orange) against the network overview of all connections.

SynerScope is able to add even more context like similar data from other leaks (SwissLeaks, LuxLeaks, OffshoreLeaks) and Chamber of Commerce data from various countries depending on who is looking. Our technology provides context at high speed, saving thousands of man days of data research.



SynerScope illuminates Dark Data

Author: Stef van den Elzen

Nearly every company is collecting and storing large amounts of data. One of the main reasons for this is because data storage has become very cheap. However, storage may be cheap, the data also needs to be protected and managed which is often not done very well. Obviously, not protecting the data puts your company at a risk. More surprisingly, not managing the data brings an even higher risk. If the data is not carefully indexed and stored, it becomes invisible, underutilized, and eventually is lost in the dark. As a consequence the data cannot be used to the companies advantage to improve the business value. This is what is called dark data, “the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes.” — Gartner.

The potential of dark data is unimagined; performing active exploration and analytics enables companies to implement data-driven decision-making, strategy development, and unlock hidden business value. However, there are two main challenges companies are facing: discovery and analysis.


Not only is the dark data invisible, it is often stored in separate data silos; all isolated and separated per process, department, or application, and all are treated the same, despite the widespread variation in value. There is no overview of all data sources or how they are linked and related to each other. Also, because all silos are detached and data is stored for business purposes it lacks structure or metadata that hinders the determination of its original purpose. As a consequence there exists no navigation mechanism to effectively search, explore, and select this wealth of data for further analysis.


A large portion, roughly 80-90%, of this dark data is unstructured. So in contrast to numbers it consists of text, images, video, etc. Companies lack the infrastructure and tools to analyze this unstructured data. Business users are not able to directly ask questions to the data but need the help of data scientists. Furthermore, it is important not only to analyze one data source in isolation, as currently occurs with specialized applications, but to link multiple heterogeneous data sources (reports, sensor, geospatial, time-series, images, and numbers) in one unified framework for a better context understanding and multiple perspectives on the data.


The SynerScope solution helps companies overcome the challenges of discovery and analysis and simultaneously helps customers with infrastructure and architecture.

SynerScope serves as a data lake and provides a world map of the diverse and scattered data landscape. It shows all data sources, the linkage between them, similarity, data quality, and key statistics. Furthermore, it provides navigation mechanisms and full text search for effortless discovery of potential valuable data. In addition, this platform enables collaboration, data provenance, and makes it easy to augment data. Once interesting data is discovered and quality is assessed it is selected for analysis.

With SynerScope all types of data types such as numbers, text, images, network, geospatial and sensor-data can be analyzed all in one unified framework. Questions to the data can be answered instantly while they are formed using intuitive query and navigation interaction mechanisms. Our solution bridges the gap between data scientist and business users and engages a new class of business users to illuminate the dark data silos for a truly data-driven organization. At SynerScope we believe in data as a means, not an end.

Example SynerScope Marcato multi-coordinated visualization setup for rich heterogeneous data analysis; numbers, images, text, geospatial, dynamic network, all linked and interactive.


Visual Analytics with TensorFlow and SynerScope

Author: Stef van den Elzen

TensorFlow is an open source software library for numerical computation using data flow graphs. This project is originally developed by the Google Brain team and recently made open source. Enough reason to experiment with this.

Due to the flexible architecture we can use this not only for deep learning but also for generic computational tasks that can be employed on multiple CPU/GPUs and platforms. By combining the computational tasks with SynerScope’s visual frontend that also allows for interactive exploration we have a powerful scalable data sense-making solution. Let’s see how we can do this.

Often when we load a dataset for exploration we do not know exactly what we are looking for in the data. Visualization helps with this by enabling people to look at the data. Interaction gives them techniques to navigate through the data. One of these techniques is selection. Selection, combined with our multiple-coordinated view setup, provides users with a rich context and multiple perspectives on the items they are interested in. One of the insights we are looking for when we make a selection is

“which attribute separates this selection best from the non-selection”.

Or in other words what attribute has specific values for the selection that are clearly different from the values of the non-selection. We can of course see this visually in a scatterplot or histogram for example, but if we have thousands of attributes then this quickly becomes cumbersome to check each attribute manually. We would like to have a ranking of the attributes. We can do this by computing the information gain or gain ratio. This seems like a good opportunity to test out TensorFlow.


We implemented the computation of the gain ratio in Python/TensorFlow and discuss the different parts below. The full source code is available at the bottom as an iPython notebook file. First we load the needed modules and define different functions to compute the entropy, information gain, and, gain ratio. Next we define some helper functions for example to sort a matrix for one column, to find splitpoints and to count the number of selected items versus non-selected. Then we read the data and compute for each attribute the gain ratio and the according splitpoint.


Now let’s apply this to a dataset. We take a publicly available dataset[1] about car properties and load these into SynerScope. This dataset contains properties such as the weight of the car, the mpg usage, number of cylinders, horsepower, origin etc. Now we wonder what separates the American cars from the European and Japanese cars. From the histogram in SynerScope Marcato we select the American cars and the gain ratio computation.

American Cars

Attribute gainRatio splitPoint
displacement   0.116024601751 97.25
mpg 0.0969803909296 39.049
weight 0.0886271435463 1797.5
cylinders 0.08334618870 4.0
acceleration 0.0801976681136 22.850
horsepower 0.0435058288084 78.0
year 0.00601950896808 79.5

We see that displacement and mpg are the most differentiation factors for American cars. We can verify this by plotting these on a scatterplot. See figure below, the orange dots are the American cars.

We could also take the cars from 1980 and thereafter and see what separates them most from the other cars. Here we see that besides year, the miles per gallon usage and cylinders are the most differentiating factors. Again we see this in the scatterplot.

Cars produced after 1980

Attribute gainRatio splitPoint
year 0.338440834596  79.5
mpg 0.113162864283  22.349
cylinders 0.100379880419  4.0
horsepower 0.0872011414011  132.5
displacement 0.0866493413084   232.0
weight 0.0861363235593  3725.0
acceleration 0.0501698542653


As the key focus of TensorFlow is on deep learning and neural networks, it can sometimes require some creativity to handle more generic computation, such as the information gain metric we used as an example. By using a hybrid approach where data is moved between TensorFlow structures and numpy arrays, we were able to make a performant implementation. We are anxiously monitoring further developments, as it is a fast-moving platform, and we hope that some features that currently only exist on the numpy side, such as argsort, will be available in due time.

For now, the hybrid combination works well enough, and using TensorFlow for the computation and SynerScope Marcato for the visual exploration gives us a much faster route to understanding our data and discovering new patterns.


[1] Dataset:
[2] Source code (iPython notebook): InformationGain


SynerScope empowers Apache Spark on IBM Power8 to truly deliver deep analytics

Author: Jorik Blaas

Let’s start by introducing the three key components:

  1. SynerScope is a deeply interactive any-data visual analytics platform for Big Data sense-making.
  2. Apache Spark is a lightning fast framework for in-memory analytics on Big Data.
  3. IBM Power8 is a high-bandwidth low-latency scaleable hardware architecture for diverse workloads.

In a world where the speed and volume of data is increasing by the day, being able to scale is an increasingly stringent demand. Scale is not only about being able to store a large amount of data, but as data size grows, it gets gradually more difficult to move data. In classic architectures, running analytics used to be something that you did in your analytic data-warehouse, and moving an aggregated, filtered or sampled dataset from your main storage into the data-warehouse was an acceptable solution. Now that analytics touches a growing number of data sources, each of ever increasing size, moving the data is less of an option.

To provide fast turnaround time in deep analytics, the computation has to be moved close to the data, not the other way around. Hadoop has brought this technology to general availability with MapReduce over the past half decade, but it always has remained a programming model that was difficult to understand, as the concepts originated in High Performance Computing.

Apache Spark is the game changer currently moving at incredible speed in this space, as it offers an unprecedented open toolkit for machine learning, graph analytics, streaming and SQL.

While most of the world is running Spark on Intel hardware, Spark as a technology is platform independent, which opens the doors for alternative platforms, such as OpenPOWER. IBM is heavily committed to developing Spark, as announced last June.

After building Apache Spark on our Power8 machine, we were able to instantly run our existing python and scala code. We noticed that the Power8 architecture is especially favorable towards jobs with a high memory bandwidth demandUsing a dataset of a five-year history of github, (100GB of gzipped JSON files), we were able to churn through the entire set in under an hour, processing over 100 million events. After processing, we can load the resulting dataset into SynerScope for a deeper inspection.

The image below shows the top 100.000 most active projects, grouped by co-committers. Projects that share committers are close to each other. Interestingly, this type of involvement-based grouping shows very clearly how different programmer communities are separated. The island of iPhone development (in orange) is really isolated from the island of Android developers.

With Spark on Power8, we were able to handle a huge dataset, reduce it into its key characteristics and it allowed us to make sense of complex mixed sources.