Geeks With Blogs

News This is the *old* blog. The new one is at blog.sixeyed.com
Elton Stoneman
This is the *old* blog. The new one is at blog.sixeyed.com

We went live recently with a reporting solution which sends details of over-the-counter trades to a global trade repository. It's a fairly complex solution but we've had a great team, did lots of dry-run testing and the initial load of 250K open trades was stress free.

In the run up to go-live we'd given some thought to failure scenarios and how we would recover from them. The solution has some significant external dependencies, and if they didn't behave as expected we could be left with large numbers of trades stuck in our solution and not reported.

There are several steps to reporting a trade, and we've split the workflow out using messaging. That gives us all the expected benefits of reliability against system failure, and linear scale, but it also means there are several entry points to the process we we can use for recovery scenarios.

If we had a monolithic workflow with a single entry point, our recovery options are limited. We could identify trades with problems and start the workflow again for each trade, but if they had already been partly processed we'd potentially end up with inconsistent data.

As it is, we have a pipeline that we can start up at key points. So when we've collected all the enrichment data we need, we send a "GenerateReport" message. The handler for that just builds the outgoing report in XML, stores it in the database and sends a "SubmitReport" message. So if we had a problem with the connection to the trade repository, we could identify all affected trades and send another "SubmitReport" message - so the interrupted workflow would continue where it left off.

To facilitate that in bulk we built a very simple admin tool. It has a text box where you paste a SQL statement, a dropdown box where you select a message type, and a button to start the process. When you click the button it runs a count over your SQL to show how many trades you're impacting, and then if you confirm you want to go ahead it runs the SQL and for each row that returns it builds a message and sends it.

For maximum flexibility, we allow you to pick any message type that has a handler, so we can resume the workflow at any point rather than just the main ones we think we'll need. Different message types have different input properties, and the SQL needs to return enough data to populate the message. We do that simply by using reflection and matching the SQL column names to the message properties, so with this message structure:

public class SubmitReportMessage
{
    public string TradeId {get; set;}

    public string ReportingRegime {get; set;}    
}

- we'd need to craft SQL like this:

SELECT Id as TradeId, 'EMIR' as ReportingRegime
FROM xyz

The admin tool - known among the team as the Swiss Army Knife (as it can fix any problem), or the Weapon of Mass Destruction (as you can potentially do an awful lot of damage with it) - proved to be massively useful. We had a few expected failures, where static data was missing. The recovery there was to get the static data updated and then send the "Enrich" message for the affected trades. And we had a few unexpected failures where we lost the database connection. The recovery there was to send an interim message in the workflow to pick up where the processing had left off.

One thing we didn't do, which I'm thinking about, is for all entry-point messages to have a "BypassValidation" flag, or something similar, so for recovery situations we can force processing through even if the handler thinks the state isn't valid. 

For instance, the "SubmitReport" handler grabs the latest report from the database, and checks a flag to see if it's been sent before going on to send it and set the flag. In normal operation we need that validation, but in a recovery scenario we want to ignore the check and force the processing through. 

That would make the the admin tool all-powerful and let us recover from scenarios without having to do an additional data fix. But it would make it even more dangerous...

Posted on Sunday, February 9, 2014 10:26 PM nServiceBus | Back to top


Comments on this post: Messaging, recovery scenarios and bypassing validation

# re: Messaging, recovery scenarios and bypassing validation
Requesting Gravatar...
Was this built with NServiceBus?

Left by Udi Dahan on Feb 10, 2014 6:44 AM

Your comment:
 (will show your gravatar)


Copyright © Elton Stoneman | Powered by: GeeksWithBlogs.net