Tag Archive for: MapR

Data Lakes, Buckets and Coffee Cups

Author: Monique Hessling

Over the last years, primarily large carriers and especially the more “cutting edge” ones (for all the doubters: yes there is such a thing as a cutting edge insurer), have invested in building data lakes. The promise was that these lakes would enable them to more easily use and analyze “big data”, and gain insights that would change the way we all do business. Change our business for the better, of course. More efficient, better customer experiences, better products, lower costs. In my conversations with all kinds of carriers, I have learned that I am not the only one who struggles to totally grasp this concept:

A midsize carrier’s CIO in Europe informed me that his company was probably too small for a whole lake, and asked me if he could start with just a “data bucket”. His assumption was that many buckets ultimately would construe a lake. Another carrier’s CIO explained to me that she is the proud owner of a significant lake. It is just running pretty dry since she analyzes, categorizes and normalizes all data before dumping it in. She explained that she was filling a big lake with coffee cups full of data. It would take her a long time to get that lake filled..

You might notice that these comments all dealt with the plumbing of a big data infrastructure; the carriers did not touch on analytics and valuable insights yet. Let alone on operationalizing insights or measurable business value. Many carriers seem to be struggling with the classical pain-point of ETL, also in this new world.

By digging into this issue with big data SMEs , learned that this ETL issue is more a matter of perception than a technological problem. Data does not have to be analyzed and normalized before being dumped into lakes. And it can still be used for analytical exercises. Hadoop companies such as Hortonworks, Cloudera or MapR, or integrated cloud solutions such as the recently announced Deloitte/SAP HANA/AWS solution provide at least part of the solution to dive and snorkel in a lake without restricting oneself to tipping a toe in a bucket of very clean and much analyzed data.

And specialized firms such as SynerScope can prevent weeks, months or even longer of filling that lake with coffee cups full of clean data by providing capabilities to fill lakes with many different types of data fast (often within days) and at a low cost. Adding their capabilities in specialized deep machine learning to these big data initiatives allows for secure, traceable and access controlled use of “messy data” and creates quick business value.

Now, for all of us data geeks, it feels very uncomfortable to work with, or enable others to work with data that has not been vetted at all. But we’ll have to accept that with the influx of the massive amounts of disparate data sources carriers want to use, it will become more and more cost and time prohibitive to check, validate and control every piece of data being used by our businesses at point of intake into the lake. Isn’t it much smarter to take a close look at data at the point where we actually use it? Shifting our thinking that way, coupled with technology available, will enable much faster value out of our big data initiatives. I appreciate that this creates a huge shift in how most of us have learned to deal with data management. However, sometimes our historical truths need to be thrown overboard and into the lake before we can sail to a brighter future.