Hey Azure, how's my Cloud Service doing?

The Azure Management APIs give you a lot of useful stats and you can easily use them to build a customised dashboard for an instant healthcheck of your Azure estate.

Documentation is a little sketchy though, so this is the first of (probably) many posts detailing how to pull stats out for different Azure services. First off you'll need a management certificate – see How to programmatically upload a new Azure Management Certificate – and the rest of this post is about getting information about Cloud Services.

Cloud Services are the original PAAS solution on Azure, giving you Web and Worker roles for front-facing and background workloads. The newer Web Apps have some advantages:

much faster deployment
multiple deployment slots
easy config through the portal

But Cloud Services still have a place in many Azure architectures, because Web Apps have some significant disadvantages:

you can't independently scale staging slots (so no load testing your pre-production release)
only one Web App per Traffic Manager endpoint (so no load balancing across data centres)
SSL domain name enforcement (so you can't serve data from domain A with a cert for domain B - which is often handy in testing)

Currently my preference is to use Web Apps for dev and test environments and then Cloud Services for pre-prod and prod, and for Cloud Services I want to see stats like this:

From left to right I've got details about the service like the name, status and latest deployment label. Then the current CPU usage (average across all nodes), and the recent CPU trend. Then the scale - how many nodes I'm currently running, and what the auto-scale settings are at the moment.

The Management Libraries for Azure on NuGet gives you all this data, and it’s fairly striaghtforward to use – although at the time of writing it's in a bit of a muddle (the common namespace is changing from Microsoft.WindowsAzure to Microsoft.Azure and not all the libraries are in sync yet).

Cloud Service Details

Start by loading a CertificateCloudCredentials object with your management cert and subscription ID, and then you can get the basic status information from the ComputeManagementClient which will populate a HostedServiceGetDetailedResponse.Deployment object from the service name (the CloudService class is my simplified model):

HostedServiceGetDetailedResponse.Deployment deployment = null; using (var computeClient = new ComputeManagementClient(_credentials)) { var service = await computeClient.HostedServices.GetDetailedAsync(_serviceName); deployment = service.Deployments.First(x => x.DeploymentSlot == DeploymentSlot.Production); }

var model = new CloudService { Name = _serviceName, DeploymentLabel = deployment.Label, Status = deployment.Status.ToString };

model.CurrentScale = new CurrentScale { Count = deployment.RoleInstances.Count, Size = deployment.RoleInstances.First.InstanceSize };

Runtime Metrics

To get the CPU load, you need to use the MetricsClient, which surfaces metrics for various different services. Each type of service has different metrics available, and each metric has to be requested for a valid period and duration (e.g. the 'Percentage CPU' metric for Cloud Services has to be requested in specific durations, and the start and from dates need to be a multiple of the time grain). See Neil Mackenzie’s post Using Azure Monitoring Service API with Azure Virtual Machines for a very good explanation of how the metrics work.

In my dashboard I show the average load for the last 15 minutes, so the management API call looks like this:

var roleName = deployment.Roles.First.RoleName; //assumes a single role var resourceId = ResourceIdBuilder.BuildCloudServiceResourceId(_serviceName, deployment.Name, roleName);

model.LatestCpu = new CpuMetric; using (var metricsClient = new MetricsClient(_credentials)) { var to = DateTime.UtcNow; var from = DateTime.UtcNow.AddMinutes(-15); var values = await metricsClient.MetricValues.ListAsync(resourceId, new[] { "Percentage CPU" }, "", TimeSpan.FromMinutes(5), from, to); var value = values.MetricValueSetCollection.Value.First; model.LatestCpu.Value = value.MetricValues.First(x => x.Average.HasValue).Average.Value; model.LatestCpu.MetricName = value.Name; }

The ResourceIdBuilder gives you an ID that's suitable for the metrics API client. For Cloud Services, the ID is built from the service name, deployment name and role name - as metrics are collected at the role level.

Auto Scale Setup

Auto Scale also applies to many services, so there's a separate API client for that - the AutoscaleClient - which uses a rival resource ID builder:

var autoScaleResourceId = AutoscaleResourceIdBuilder.BuildCloudServiceResourceId(_serviceName, roleName, true); var model = new AutoScale; using (var autoscaleClient = new AutoscaleClient(_credentials)) { var settings = await autoscaleClient.Settings.GetAsync(key); var profile = settings.Setting.Profiles.First;

var increaseRule = profile.Rules.First(x => x.ScaleAction.Direction == ScaleDirection.Increase); model.UpCount = int.Parse(increaseRule.ScaleAction.Value); model.UpMetricName = increaseRule.MetricTrigger.MetricName; model.UpThreshold = increaseRule.MetricTrigger.Threshold;

var decreaseRule = profile.Rules.First(x => x.ScaleAction.Direction == ScaleDirection.Decrease); model.DownCount = int.Parse(decreaseRule.ScaleAction.Value); model.DownMetricName = decreaseRule.MetricTrigger.MetricName; model.DownThreshold = decreaseRule.MetricTrigger.Threshold; }

Details about the Auto Scale setup are in the ScaleRule class (one for scaling up and one for scaling down). They include the action to take (how many servers to add or remove), the cooldown period between triggers, and the metrics that cause the trigger to fire.

Sample Data

I've wrapped all this into a CloudServiceLoader class and when I call it with a service name (e.g. my-api-prd) then the populated model looks like this:

{ "Name": "my-api-prd", "DeploymentLabel": "My-Api-PreProduction_Build#10", "Status": "Running", "CurrentScale": { "Count": 2, "Size": "Small" }, "AutoScale": { "UpCount": 1, "UpMetricName": "Percentage CPU", "UpThreshold": 70.0, "DownCount": 1, "DownMetricName": "Percentage CPU", "DownThreshold": 50.0 }, "LatestCpu": { "Value": 56.278934000000007, "MetricName": "Percentage CPU" } }

It's easy to pull that data out into your dashboard and just show the bits you want, and have it refreshed at your own interval.

Dude, where's the github link?

Coming. I'm going to put the whole dashboard up onto github soon, with a super easy install so you can publish one in your own subscription without writing any code.

Cloud Service Details

Runtime Metrics

Auto Scale Setup

Sample Data

New on Geeks with Blogs