The value of a Diagnostics service

For an integration solution with lots of dependencies, it's difficult to definitively state that everything is wired up correctly and that all components in the stack are working. A while ago, Michael Stephenson proposed having a diagnostic service in a solution as a quick way of verifying that a given environment was working, end-to-end. Initially I was sceptical, on the grounds that there shouldn't be any code in a solution which isn't directly solving a business problem, but having tried it out I've been won over.

We now have a dedicated suite of diagnostics services which check a whole range of stuff, including availability of all the downstream services, the in-process and out-of-process caches, and the accessibility of BizTalk file drops. We also include environmental stuff like the version number of the solution, machine name, server date and time, IP addresses etc. And we have a rolled-up service which gives us a green or red status for the solution.

For devs and technical owners of the solution, this gives us a very detailed view of the health of the stack, but we have REST endpoints for the general healthcheck services, and we now use these all over the place:

in our automated deployment process, we send an email after an environment had been deployed. That email contains the output of the healthcheck service, so we can tell if a deployment was successful;
the REST URLs are published to the team, so testers and service consumers can quickly check if the environment is operational before starting work on a release;
we have something like 10 environments which are all catalogued, and the REST URL for the version number service means we're not manually updating version numbers in the catalogue after a release;
we have nightly releases of the integration solution and the downstream services, so we have simple PowerShell script run from a Windows schedule every morning, which checks the output of the REST services and emails the team with the test environment status;
calling the diagnostic service warms up the server app pool, and all the app pools of the downstream services, so the scheduled job warms everything up after a recycle.

With this simple stuff in place we're saving a bunch of time investigating problems which turn out not to be problems in the integration solution, and the offshore test team can quickly see if there are problems with the stack before investing time on tests. Highly recommended.

New on Geeks with Blogs