Geeks With Blogs

News This is the *old* blog. The new one is at blog.sixeyed.com
Elton Stoneman
This is the *old* blog. The new one is at blog.sixeyed.com

My latest Pluralsight course is out now, it's about building Big Data solutions using .NET technologies:

The typical Big Data stack (think Kafka, Cassandra, Kibana and the Hadoop ecosystem) is heavily Java based. Microsoft have taken the best of those technologies and made them available as managed services in Azure, but also worked hard on integrating them with .NET. Today you can use those tried and trusted Big Data platforms in Azure with C#, PowerShell and Visual Studio.

In the course I build out a fairly typical Big Data solution where there are two separate processing paths - a deep storage component where all incoming events are permanently stored for later analysis, and a real time component where key metrics are extracted from the event stream and visualized in dashboards:

image

The demo solution looks at receiving events from mobile devices, and it's a simplified version of a real solution I delivered, which is currently processing about 500 million events per day, and is expected to scale up to 1 billion events per day by the end of the year.

Data comes into the system with a simple REST API which receives batches of events from devices. The API debatches them, enriches each event and then re-batches them and sends them to Azure Event Hubs, which is the messaging component in Azure that can run up to IoT scale, receiving millions of events per second.

The course has a strong focus on applying the skills you already have to Big Data solutions, so in the first part of the project we build custom event processing components for storing events in Azure Blob Storage, and normalized metrics in SQL Azure. Those event processors are C# Worker Roles that run very efficiently, and they can process huge loads with minimal compute cost.

To make sense of the data in deep storage, I show how to use Apache Pig as a friendly way to run to Hadoop map/reduce jobs on HDInsight, and also how to integrate Pig scripts with .NET projects, so you can use C# as part of your analysis pipeline where it's a better fit. And for the real time data visualization, I use dashing.net, which is an excellent and simple framework for building dashboards where the data can be updated using push or pull models.

Towards the end of the course I look at an alternative solution which removes some of the custom components and uses more of the Azure HDInsight platform:

image

I replace the real time Worker Role with Apache Storm as the event processor, plugging into Event Hubs as the source, and cutting out all the custom plumbing code for handling events. I also look at replacing SQL Azure with Apache HBase, a massively scalable NoSQL database which lets you store your processed event data at a much finer level of detail, but still query it in real time.

Both Storm and HBase are available in Azure as managed components, each as a dedicated type of HDInsight cluster. The .NET integration is in the early stages, but you can build Storm topologies using a mixture of your own C# components and packaged Java components, and the .NET SDK for HBase is functional - if not fully featured.

Real World Big Data in Azure is just shy of 5.5 hours, and it's split across eight modules:

1.    Understanding Big Data in Azure
2.    Ingesting Data into Event Hubs
3.    Storing Event Data for Batch Queries
4.    Querying Batch Data in Deep Storage
5.    Normalising Event Data for Real-Time Queries
6.    Building Real-Time Dashboards
7.    Using Storm for the Plumbing 
8.    Using HBase for Storage

Since I started on this course, I've been doing a lot more work with Big Data, using .NET solutions running in Azure, so there will be more of this coming soon.

Posted on Monday, June 22, 2015 12:50 PM Azure , Pluralsight , big-data | Back to top


Comments on this post: Real World Big Data in Azure

# re: Real World Big Data in Azure
Requesting Gravatar...
Eight modules is not a period that seems to me very unpleasant but I guess that after that I need some professional courses in addition, just to get very into this subject. Thank you for the tip, maybe I'll join
your modules.
Left by Florence on Aug 07, 2015 11:06 AM

Your comment:
 (will show your gravatar)


Copyright © Elton Stoneman | Powered by: GeeksWithBlogs.net