Mathematica Data Visualization by Nazmus Saquib, Packt Publishing Book Review

I love data, math, stats and I love to visualize information. But Mathematica has always been, and remains the forbidden fruit to me (due to its cost). Therefore I was very glad to agree on an opportunity to review Mathematica Data Visualization. The book stood to its promise. I liked every page of it. The is book is packed with insightful examples to producing all (or most) the most common graphs and charts a data scientist may need in ones professional life. What I actually initially expected is to read the book for a few weeks, but finished it in one evening. Pity, I could not install comparable software on my Linux laptop to try the visualizations out myself. But I was touched by the sheer visualization offerings. Mathematica seems to be the only one that offers Paired Histograms, and I wasn't expecting it to feature a full support to Geo-visualizations (maps and paths).
Nazmus is a super skilled professional, he sure brings a lot to the table, I mean to the reader of the book. I would like to have another book from him on the actual programming in Mathematica with a real project coverage.
It is a very high quality book with many reference links to other relevant literature which will enable you to expand your knowledge further. Good investment at ~ $22 in my view and it actually can be used as an evaluation to using Mathematica in your next project or research.
Again, good book, 5 out of 5.
Disclaimer: I received a free ebook version of this book for review purposes from the publisher.

Kali Linux Network Scanning Cookbook by Justin Hutchens, Packt Publishing Book Review

With the advent of the constantly connected computers (not only the Internet) the attack surface has increased immensely. At the same time the user machines became as powerful as the servers were. However, there was very little done to educate computer professionals to detect, prevent and cope intrusions or penetration attacks. Kali Linux Network Scanning Cookbook can serve very well as one to close the gap.

A little on Kali Linux: it is a specialized distribution for penetration testing and forensics purposes. It got a lot of traction lately in the penn-testing and security pros circles. As one of the less subtle features, it runs always under the super user rights and typically is installed as a VM guest OS.

Let me tell, I was shocked the book counted 450 + pages! This is how much insight and tooling actually was created to harden your computing infrastructure (jargon used by security pros, meaning making your systems less vulnerable). It is hard to imagine an average practitioner would harness each in one’s daily use, but I would to strive to. And I advocate looking in details into every applicable offering. Besides, the book uses a lot of Python code, thus the reader shall make sure one is familiar and feels comfortable performing some coding as well using basic text editors as Nano and VI. If the reader ever decides to prepare for the book I would recommend a good book on Learning Python and another from Packt on VI – Hacking VIM.

Justin being a very experienced professional delivers the material at a very detailed level, in depth, with a lot of examples and in a very digestible format. If I am not mistaken several dozens of tools are covered in his book. I think this is unprecedented!

I liked the most the chapter on TCP scanning, it was both fun and insightful. The other topic I enjoyed and trust can apply at work as a data guy is the SQL Injection with sqlmap. Fingerprinting and ghost OS detection were new to me. The familiar, but impossible to not to get in touch with tools discussed in the book are matasploit, fping and SNMPWalk. Frankly, most tools were totally new to me and I was able to gain a lot of knowledge out of this book.

I rated this book as 5 out of 5 – there seems to be nothing that can be taken out of the book nor added. A great read, highly recommend it!

Disclaimer: I was given a free electronic copy of the book by the publisher for the sole purpose of publishing a review.

Learning Java 8 By Mike Kelly, O'Reilly Media Video Review

Why I chose to learn on Java 8 is because of all the latest buzz around it. Turns out the rumours on the demise of Java have been greatly exacerbated. So Learning Java 8 could, and I trust indeed, be a very good investment into your career.

I need to start by saying I like Mike Kelly is a great lecturer. Yet Mike is what every teacher should be: have a clear, well paced voice, and deliver the material in a non-rush ahead manner.
The course the author teaches is quite basic, but you will be able to accomplish a near real life Java desktop console application. If you already have some basic knowledge on how to write in any other procedural language then you will mostly gain insight on how to utilize Eclipse to be productive coding in Java.
Some basics get repeated at times a few more times then to my liking, but repetition is a good for learning. Like I hinted, you will not learn here any CRUD database operations nor RESTFULL or Web applications are covered, but regardless, Mike teaches enough to wrestle these specifics yourself, after all since Java is very much alive there is a huge user community that is always willing to help you on forums or IRC chats.
If I were eligible to provide advice in material preparations, I would suggest to re-arrange some sections so for example the For Loops or If statements would be covered before the classes or Unit Test.
Another advice to Mike would be hinting on stopping more often allowing for exercises on my own machine.
All in all, it is great teaching material, let me stress that, to the newcomers to the Java World.
4 out of 5 is what I think it should be rated at.
Disclaimer: I received a free version of this video for review purposes as part of the O'Reilly Reader Reviewer program.

PostgreSQL: Up and Running by Regina Obe, O'Reilly Media Book Review

A Practical Guide to the Advanced Open Source Database

Like I have already mentioned in my previous blog post databases are lately in the spotlight, left and right. This was the primary reason for me to choose yet another book on databases for review*. I know that NoSQL data stores are more trendy for now, but the traditional RDBMS' would not give its sheer install base out quite easily like that yet to them. The secondary reason was, while I am a full time in SQL Server, I suspected I may be missing something by not getting familiar with what most IT pros may state competition.

Indeed, having a backdoor or more correctly a mechanism to allowing custom extensions (called add-ons from PG 9.1) to be baked into the database engine allow taking PostgreSQL to new heights without going through costly upgrades. One of the intersting ones (at least to me) is the key-value store called HStore Just for reference, starting SQL Server 2014 the In-Memory engine is part of the core database. Did I mention the RDMBS' don't give up just yet?

The book mentions so many different versions of PostgreSQL so many times at time my head was spinning trying to recall what is used in what version or different. After finished reading the book I started to suspect it would be better to for the author to concentrate on the latest version because the previous builds are so different. Overall, I fail to grasp what was the main objective of this book. The material coverage is sparse or not in depth, of course as a result the book is quite short, and you can always buy another book or solicit various forums or IRC chats.

Well, the book has answered my primary and secondary interests, and seems that I am not the biggest fan of the PostgreSQL as a database engine yet. Why? This is probably because I am too spoiled by the SQL Server install and forget way of operating. Me, as a database developer and DBA needing to restart the database after a simple security file modification or setting the memory via SHMMAX or threads for multiple backup restores make me chuckle.

However, PostgreSQL has many advantages, I admit, too. What I liked is the ability to backup a single table or have backups restorable to any version of the database engine is a big plus. Not to mention triggers on views, unlogged tables or exclusion constraints. Read the book to know a lot more.

When it comes to the book itself, Regina and Leo did a fantastic job, they know the product really well, 5 out of 5 is my mark.

*Disclaimer: I received this book for free in exchange for a review as part of the O'Reilly Reader Review Program.

Mastering DynamoDB by Tanmay Deshpande, Packt Publishing Book Review

Databases are very much in the spotlight lately and especially the NoSQL breed. While there are dozens of offerings on the market only a handful tops the list, one such offspring in the key-value area is Amazon's DynamoDB. Being a close relative to such popular players on this arena as Redis or Voldemort DynamoDB I figured has many unique points, add-ons and a strong backing by the user community, not only the mighty Amazon corporation. Mastering DynamoDB as a book came out at a very strategic time.

It is a great technical read, too. Tanmay (the author) walks you gently into the wonderful NoSQL database world. Then the book takes you, arm with DynamoDB, and make a fearless traveller sailing through high seas of today’s turbulent and fierce data streams and make you prowl the dark alleys of handling the data in the Cloud.

The book is structured so it devotes its several first chapters to the nitty-gritties of the DynamoDB and then explains on best practices and best usage scenarios. The book has an advanced chapter for those who like the extremes. For example relational integrity is suddenly discussed in a book about NoSQL (no schema or structure supposed to be there the core, alas not so fast). The book tastefully ends with an overview of the top 10 or so of the sheer third party offerings from either Amazon itself or GitHubers.
The best one I liked is the local DynamoDB and the ability to conduct transactions. The module that allows to scale the database appeared to be very much of value, but frankly I was surprised it is not written by Amazon itself. To say more, the design decision of having a developer (or perhaps an admin) being responsible for assigning and provisioning compute throughput for each table made my eyebrows raise.

The author appeared very savvy in the subject of Cloud Data (perhaps I coined it), I actually learned quite a few interesting techniques and found out that Amazon has SLAs for each component, even for their internal systems and especially such a crucial piece as DynamoDB. And they are tight SLAs. Yet, make a lot of sense to me. Nobody argues Amazon does not successfully process huge volumes of data, fast.

Anyway, I liked the book and the author much, heck, perhaps even more than the DynamoDB as a database itself.

It's a 5 stars out of 5.

Getting Started with Impala by John Russell, O’Reilly Media Book Review

Interactive SQL for Apache Hadoop

Impala is a recent, but very valuable addition to the Hadoop ecosystem. I must say (after reading the book) Cloudera made a big step forward in the right direction.

The rational behind bringing Impala to life is the proliferation of SQL. SQL as a language has many flavours, but in one form or another is already known to data practitioners coming to Hadoop from various platforms and DBMS. Impala implements a subset of ANSI-92 SQL specification, regardless, even the subset is powerful enough to make a developer productive. In my opinion, since SQL it is based on algebra and sets, and because HDFS (Hadoop) is just able to expose datasets Impala is the right choice for MDL and DDL even for the Big Data projects.

At 110 pages the book is not terribly long, but bear in mind Impala as a product is still under active development, as a bonus, the author has a close relationship with the product working at Cloudera, this is a big plus resulting in top professional content. John structured the book so it is basically divided into two parts: 1st and the largest is on Impala implementation and its role in data analysis and processing, the 2nd part covers most commonly used tasks, pitfalls or simply advice and techniques.

What I did not find is more on how to use it with Hive, Scoop, HBase and Pig, I will take a star out of my rating for this.

Let me reiterate, the book covers the Cloudera’s Hadoop Impala distribution, if you are using a different distribution, Impala is not part of it.

Like I said, I am giving this book a 4 out of 5 stars. Good work John!

Disclaimer: the book was provided to me for free as part of O’Reilly’s blogger reviewer programme.

How to get maximum row size of a SQL Server table

I am doing a lot of ETL work (typically) and one of the particulars I want to know planning my packages design is the maximum length of a table row.

Just to expand further, it is often prudent to know the row size for future performance or for capacity planning.

So, without further ado, here is the SQL code (works on SQL Server 2005 and onward):

   1: DECLARE    @table_name NVARCHAR(115),
   2:         @1stCol NVARCHAR(115),
   3:         @sql NVARCHAR(MAX);
   5: -- Initialize the table name to sample
   6: SET @table_name = 'THE TABLE NAME';
   8: SELECT TOP 1 
   9:     @1stCol = name
  10: FROM sys.columns
  11: WHERE object_id = OBJECT_ID(@table_name);
  13: -- If you need the total rows for say an eaverage then drop the TOP N clause
  14: SET @sql = 'SELECT TOP 1 ' + @1stCol + ', ROW_NUMBER() OVER (ORDER BY ' + @1stCol + ') AS [Record Number]' + ' , (0';
  16: SELECT
  17:     @sql = @sql + ' + ISNULL(DATALENGTH(' + name + '), 1)'
  18: FROM sys.columns
  19: WHERE object_id = OBJECT_ID(@table_name)
  20: SET @sql = @sql + ') AS [Row Size in Bytes] FROM ' + @table_name + ' ORDER BY [Row Size in Bytes] DESC';
  22: -- Optionally, print the statement
  23: PRINT @sql
  25: -- Execute
  26: EXEC (@sql)

Using my code you can find the average row size or can calculate the total table size (e.g. using Excel)

Cloudera Administration Handbook by Rohit Menon, Packt Publishing Book Review

Cloudera Administration Handbook is just another great what I call 'desk companion' book, especially a must for a beginner Cloudera Administrator.

Written in a well balanced volume of material to feature coverage ratio, by a person from "the trenches" Rohit expands exactly on what a Hadoop Admin needs and should be using in retrospect to the Cloudera offerings in this area of expertize to successfully accomplish ones day-to-day tasks.
However, it is actually a lot more than just an admin's book, it also teaches how to install most of the Cloudera Hadoop ecosystem components, what components are typically in use by what in a business and how to configure each. That all is done in a thorough, precise and professional manner without any extra fuss or foofaraw.

I liked that the author expanded briefly, but nicely on the new features in Hadoop 2.0. For me the coverage on Map-Reduce appeared the most valuable. I admit it is a rough area of Hadoop.

The troubleshooting part must be the one to read on and re-read, but also high availability, backup, balancing, and security. Especially the Kerberos setup, I deem it a very necessary, yet rarely covered topic, that also appears very hard to understand, may be at least to me, but it was worth going through that very much. Overall, as an aside, CDH distribution is very extensive and feature rich no wonder a whole book can be dedicated to just this topic. The Cloudera Manager now after reading the book I must say is an awesome tool to have on board, it is just a great helper, but it requires a good book as Cloudera Administration Handbook by Rohit Menon to get acquitted with.

Have it beside you, at your desk.

Five out of five stars.

SQL Server 2014 Development Essentials by Basit A. Masood-Al-Farooq, Packt Publishing Book Review

Being “stuck” with SQL Server as a heavy user for over 16 years makes my heart tic each time I see a new book or any other reference released. Therefore, naturally I was glad to hear from a Packt Publishing representative on the opportunity to review a fresh off the press (or imaging) book SQL Server 2014 Development Essentials (publisher’s book site) by a very trusted in the #SQLFamily person as Basit.

Read it in one large gulp as the book is not lengthy at 170 + or so actually useful pages. The material is written in concise, clear manner. Besides, I expected least at as many more pages for such a complex and feature rich product.

But what did the book promise?The primary goal is to have a reader developed enough skills to deliver a successful database application.

The book targets database developers, administrators and architects.

However, the book deserves a lot of criticism, for example the many-to-many relationship in the book is represented in form of two tables, unfortunately, the true many-to-many relationship in RDBMS` cannot be achieved without an interim, third table, this will make many folks upset so I have submitted errata, but I can’t understand how Packt makes it shared for all readers. I shall continue on this note and also tell that even a greater flaw in this book exists – overall,s i it not providing enough guidance, advice or reference. I mean if a topic say on locking is covered why the author would not advocate on which locking option to use under what circumstances? The same applies to most topics. Furthermore, I was surprised almost nothing was covered about a database operating in the Cloud (Azure), CLR functions, CDC, no mention on Service Broker, Master Data Management, Data Quality, etc. the same is true to many more canned features (just too many to mention); without the aforesaid this book is of much less help to software architects and incomplete from for developers. The not so advocated to be used SQL Profiler is covered beside the Dynamic Management Views whereas I expect the database tuning and troubleshooting become a separate chapter on its own.

In short, I am disappointed this time, I just fail to see what gap this book closes and simply how it is any better than just reading on product features on Microsoft’s site, then more in-depth in BOL, MSDN and blog posts of the most prominent SQL Server industry leaders.

Two stars out of five because it may be served as a guide or read as preamble to starting developing a new SQL Server based database.

Disclaimer: this book was given to me for free by Packt Publishing in exchange to publishing a timely review.

SQL Server 2014 Business Intelligence Development Beginner’s Guide by Reza Rad, Packt Publishing Book Review

Microsoft SQL Server 2014 Business Intelligence Development Beginner’s Guide

I was very happy to hear a new book was released from Packt Publishing by my fellow MVP Reza Rad, I was even happier when I heard I have an opportunity to review it because I could not expect anything else than another superb content this time, too.

It stood to its promises!

The book, even though is marketed under the “Beginner’s Guide” moniker, is actually suitable for an intermediate Business Intelligence professional. Judge yourself: the book covers as advanced topics as:

  • Data Modeling using SSAS Multidimensional and Tabular with MDX and DAX
  • Manage Master Data with MDS
  • Reveal Knowledge Driven Data Quality with DQS
  • Understand prediction and Data Mining
  • Identify data patterns
  • Design Dashboards with PerformancePoint and Power View
  • Explore Power Query and Power Map as components of Power BI
  • Create powerful reports using Visual Studio


    I think this list speaks for itself, it is an impressive, unprecedented coverage of topics.

    A little more about the book: it is divided into sections as to first acknowledge the reader on what are the advantages to using a specific technology, then a short deeper dive into the exact usage example summed up into a detailed coverage of how a task was tackled and what would be readers’ take-aways. The book has high quality graphics and was read on my laptop as well as on Android tablet. All that makes this book a winner in absorbing a lot of technical content in short time plus make it stick in the head.

    The book references enough external resources in case one would like to explore any topic further. In my opinion the book serves as a huge time saver in getting familiar with the latest Microsoft BI offerings and would easily allow anybody, even new to Power BI or DQS enter a Proof of Concept mode or make a quick presentation for a project stakeholder.

    I liked the most the content on predictions and the cool visualizations that can be done with Power Map and Power View. My personal opinion, Power BI overall as a data visualization and analysis platform is just sassy!

    A few notes to a prospective reader: one needs MS Office 2013 installed, have access to Microsoft Server 2012, an active Azure account and preferably run the whole software suite on a Virtual Machine.

    Verdict: I am giving this book a 5 out 5 rating.