I've been looking into a solution recently which has a series of remote nodes publishing events to a central node. The publishers have no logic, they just need to contact the central node and record that an event has happened. Off the back of that, the central node does some work inferring what the event means, and computing its relationship to a bunch of other events. Simple enough, but the solution needs to be scalable and make efficient use of its resources.
I started off building it around NServiceBus, but when I had a better understanding of the components and the physical architecture, decided on a different approach. The central node will most likely be an EC2 cluster; using NServiceBus, cluster nodes can communicate with each other's queues, but the remote nodes would need a bridging mechanism to talk to the central node, and I'd rather have a consistent communication model. So I started rebuilding the service layer in WCF, but I wanted flexibility on how the central cluster nodes distributed work amongst themselves.
Briefly, I wanted a nice solution for this scenario: remote nodes will always talk to the central cluster over HTTP with a REST service; central nodes will distribute load amongst themselves as efficiently as they can. In a low-load situation, the node receiving the originating message should do all the work in the workflow. In a high-load situation, the node receiving the originating message should farm out work to the other nodes. EC2 will let you set up a nice load balancer, but the option to balance load between nodes OR within a node needs a bit more thought - i.e. when a cluster node receives an event published message, it can call the next service in the workflow either by remotely sending a WCF service request to the load balanced URL, or by executing the code locally.
This is actually very simple, and I've published a working example on my github code gallery: DistributedServiceSample. In the sample, all service calls in the central node are run through a generic service invoker:
>(svc => svc.Compute(jobId));
The invoker either builds a WCF client proxy, or instantiates the service implementation locally, and makes the call. In the sample the logic for deciding whether to go remote or local is done in config, but this could be worked into something more complex based on current capacity etc. The sample also allows the invoker to decide whether to make a synchronous call, or farm the work out to an async task (again, through config).
Dynamically building a WCF client proxy is very simple, there are no dependencies outside .NET and you can leave all the endpoint configuration to the normal <serviceModel> config section. When the central node is in remote mode, it gets the proxy like this:var factory = newChannelFactory<TService>("*");
service = factory.CreateChannel();
(The asterisk tells WCF to pull the client endpoint and binding from config based on the contract name and assumes there is only one client entry per contract).
When in local mode, it's a little bit more involved to do it dynamically. In the sample I have a marker interface (IService) to denote a service contract. In the service application startup I register all service implementations in an IoC container (a wrapper around Unity), and then the central node gets the service like this:
service = Container.Get<TService>();
The actual method call on the service is done through functions or actions (depending on whether the service returns a response), so it's all typesafe.
Running the sample
Build the sample locally, and open http://localhost/DistributedServiceSample.Services/JobService.svc in WCFStorm or soapUI. Call CreateJob with whatever parameter you like, and you will see output similar to this in DebugView:
JobService.CreateJob called with Name: fhwfiy
Service.GetExecutionMode using *Synchronous* for service: DistributedServiceSample.Contracts.Services.IJobService
Service.GetServiceLocation using *Remote* for service: DistributedServiceSample.Contracts.Services.IJobService
JobService.SaveJob called with jobId: fhwfiy
Service.GetExecutionMode using *AsynchronousIgnoreResponse* for service: DistributedServiceSample.Contracts.Services.IComputeService
Service.GetServiceLocation using *Local* for service: DistributedServiceSample.Contracts.Services.IComputeService
ComputeService.Compute called with jobId: 621849676
The workflow is that CreateJob triggers a SaveJob call, which in turn triggers a Compute call. The service decides whether each downstream call will be made locally or remotely, synchronously or asynchronously based on the contents of the <distributedservicesample.invoker> section in Web.config.
The obvious extension is to add an operation name to the config settings, so different operations within the same service can be executed in different ways, which is pretty straightforward. More complex is the idea of dynamically deciding whether to make a local or a remote call. The logic for the decision is all isolated, so it would be a case of swapping out the config stuff with some environment checks, so calls were made locally unless CPU or private bytes or current connections were above a threshold.