Posts

IBM & Hortonworks collaboration

Imagine the excitement here at SynerScope when Hortonworks and IBM announced their collaboration to offer Open Source Distribution on Power Systems. We are working both with the Hortonworks Data Platform (HDP) and the Power8 with it’s 2 X 12 Core and 3GHz CPU, 2 X NVidia K80 GPU and 1 TB of memory(!) Besides the obvious excitement from a technical point of view (high geek alert), this new partnership will enable us to better serve our customers.

For enterprise users running POWER8-based systems, the first microprocessor designed for big data and analytics, Hortonworks provides a new distribution option for selecting a cost-effective platform for running their big data and analytics workloads. This open source Hadoop and Spark distribution will complement the performance of Power Systems by allowing clients to quickly gain business insights from their structured and unstructured data. Adding SynerScope pushes the efficiency and impact of this new combination getting to insights from data even faster. With SynerScope they get an All in 1 solution: flexible, user-friendly, visual and at scale. It is not only about finding patterns, but about understanding them by bringing all data sources together in one single visual environment.

We will be the first working with the new Hortonworks-Power8 combination and we will keep you posted about solution launches with Hortonworks HDP and Power8 .

 

IBM’s Power8 & Synerscope join forces

Watch the animated movie made by IBM and SynerScope.

Discover smart insights of all data-types with next generation data analysis to benefit your business, fast!

With the enormous computing power of the Power8 and the next generation analysis of SynerScope combined, we are able to process structured, unstructured and Dark Data fast.

  • Does your business have high transactional volumes
  • Do you work with Sensor signals (IoT)
  • Do you work with digital photo and video
  • Do you want to know large scale network behavior
  • Do you have two or more factors at play in your business?

Watch what we can do

SynerScope empowers Apache Spark on IBM Power8 to truly deliver deep analytics

Author: Jorik Blaas

Let’s start by introducing the three key components:

  1. SynerScope is a deeply interactive any-data visual analytics platform for Big Data sense-making.
  2. Apache Spark is a lightning fast framework for in-memory analytics on Big Data.
  3. IBM Power8 is a high-bandwidth low-latency scaleable hardware architecture for diverse workloads.

In a world where the speed and volume of data is increasing by the day, being able to scale is an increasingly stringent demand. Scale is not only about being able to store a large amount of data, but as data size grows, it gets gradually more difficult to move data. In classic architectures, running analytics used to be something that you did in your analytic data-warehouse, and moving an aggregated, filtered or sampled dataset from your main storage into the data-warehouse was an acceptable solution. Now that analytics touches a growing number of data sources, each of ever increasing size, moving the data is less of an option.

To provide fast turnaround time in deep analytics, the computation has to be moved close to the data, not the other way around. Hadoop has brought this technology to general availability with MapReduce over the past half decade, but it always has remained a programming model that was difficult to understand, as the concepts originated in High Performance Computing.

Apache Spark is the game changer currently moving at incredible speed in this space, as it offers an unprecedented open toolkit for machine learning, graph analytics, streaming and SQL.

While most of the world is running Spark on Intel hardware, Spark as a technology is platform independent, which opens the doors for alternative platforms, such as OpenPOWER. IBM is heavily committed to developing Spark, as announced last June.

After building Apache Spark on our Power8 machine, we were able to instantly run our existing python and scala code. We noticed that the Power8 architecture is especially favorable towards jobs with a high memory bandwidth demandUsing a dataset of a five-year history of github, (100GB of gzipped JSON files), we were able to churn through the entire set in under an hour, processing over 100 million events. After processing, we can load the resulting dataset into SynerScope for a deeper inspection.

The image below shows the top 100.000 most active projects, grouped by co-committers. Projects that share committers are close to each other. Interestingly, this type of involvement-based grouping shows very clearly how different programmer communities are separated. The island of iPhone development (in orange) is really isolated from the island of Android developers.

With Spark on Power8, we were able to handle a huge dataset, reduce it into its key characteristics and it allowed us to make sense of complex mixed sources.