Friday, March 28, 2014

White Paper (eBook) - DB2 11 : The Database for Big Data & Analytics

This is the fourth edition of this eBook and it's as valuable as ever. It's divided into four sections with an introduction by Surekha Parekh :
  • 'DB2 11 for z/OS : Unmatched Efficiency for Big Data and Analytics' by Julian Stuhler,
  • 'Improved Query Performance in DB2 11 for z/OS' by Terry Purcell,
  • 'IBM DB2 Utilities and Tools with DB2 11 for z/OS' by Haakon Roberts and
  • 'How DB2 for z/OS Can Help Reduce Total Cost of Ownership' by Cristian Molaro 
Regarding its contents I don't know really where I should start. There's so much interesting information that I can only say, download the book and read it. You won't be disappointed. I can tell you I wasn't.

Since the title puts the focus on Big Data perhaps I can tell you a bit more about that part. As Surekha Parekh states in the introduction : "There has been an enormous explosion of data: 90 percent of the world’s data has been created over the past two years! We have also seen a rapid growth in the volume, variety, and velocity of data due to the explosion of smart devices, mobile applications, cloud computing, and social media. New technology innovations, hunger for data, and the thirst for business analytics signal that we are entering a new era of computing —Smarter Computing — the era of Insight for Discovery (...) Much of this data growth has been in unstructured data" (think in terms of Facebook, Twitter, Google searches, Youtube) "IDC estimates that by 2020, business transactions on the Internet - business-to-business and business-to-consumer - will reach 450 billion per day. This phenomenon of data explosion is called big data, and smart organizations are looking for innovative ways to collect, analyze, and turn this data into actionable insights and make predictions.". So where does DB2 (structured data) comes into play here ?

As Brian Proffitt states in this article : "You can't have a conversation about Big Data for very long without talking about Hadoop. But what is Hadoop, and what makes it so important?". Do read the article. It's a good introduction to Hadoop. And if you want the short definition, here's one from the Webopedia :
Hadoop, formally called Apache Hadoop, is an Apache Software Foundation project and open source software platform for scalable, distributed computing. Hadoop can provide fast and reliable analysis of both structured data and unstructured data. Given its capabilities to handle large data sets, it's often associated with the phrase big data.
Returning to our white paper here's how the structured (DB2) and unstructured worlds get connected. "In response, a number of tools and techniques have emerged centered on the open source Hadoop framework, including IBM’s InfoSphere® BigInsights™ technology.

While these technologies address many of the challenges inherent in analyzing big data, they also introduce new ones for organizations wanting to gain new insights by integrating the analysis of big data with core operational information. As shown in" the figure below "DB2 11 delivers some highly significant new features to allow DB2 and Hadoop/BigInsights to work together and better leverage each platform’s respective strengths. This capability lets data flow in both directions between DB2 and BigInsights, as follows".

There's more detail about how data goes from DB2 to Infosphere BigInsights and the other way round but therefore you'll have to turn to the eBook itself.

As I always say, just check it out !

No comments: