<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:copyright="http://blogs.law.harvard.edu/tech/rss" xmlns:image="http://purl.org/rss/1.0/modules/image/">
    <channel>
        <title>Disaster Recovery</title>
        <link>http://geekswithblogs.net/mattjgilbert/category/9282.aspx</link>
        <description>Disaster Recovery</description>
        <language>en-GB</language>
        <copyright>mattjgilbert</copyright>
        <managingEditor>mattjgilbert@yahoo.co.uk</managingEditor>
        <generator>Subtext Version 0.0.0.0</generator>
        <item>
            <title>High Availability</title>
            <link>http://geekswithblogs.net/mattjgilbert/archive/2010/05/27/high-availability.aspx</link>
            <description>&lt;p&gt;&lt;a href="http://www.udidahan.com/"&gt;Udi Dahan&lt;/a&gt; presented at the &lt;a href="http://ukconnectedsystemsusergroup.org/default.aspx"&gt;UK Connected Systems User Group &lt;/a&gt;last night. He discussed High Availability and pointed out that people often think this is purely an infrastructure challenge. However, the implications of system crashes, errors and resulting data loss need to be considered and managed by software developers. In addition a system should remain both highly reliable (backwardly compatible) and available during deployments and upgrades. The argument is that you cannot be considered highly available if your system is always down every time you upgrade.&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;
For our recent BizTalk 2009 upgrade we made use of our Business Continuity servers (note the name, rather than calling them Disaster Recovery servers  ) to ensure our clients could continue to operate while we upgraded the Production BizTalk servers. Then we failed back to the newly built 2009 environment and rebuilt the BC servers. Of course, in the event of an actual disaster there was a window where either one or the other set were not available to take over – however, our Staging machines were already primed to switch to production settings, having been used for testing the upgrade in the first place.&lt;br /&gt;
 &lt;/p&gt;
&lt;p&gt;While not perfect (the failover between environments was not automatic and without some minimal outage) planning the upgrade in this way meant BizTalk was online during the rebuild and upgrade project, we didn’t have to rush things to get back on-line and planning meant we were ready to be as available as we could be in the event of an actual disaster.&lt;/p&gt; &lt;img src="http://geekswithblogs.net/mattjgilbert/aggbug/140103.aspx" width="1" height="1" /&gt;</description>
            <dc:creator>mattjgilbert</dc:creator>
            <guid>http://geekswithblogs.net/mattjgilbert/archive/2010/05/27/high-availability.aspx</guid>
            <pubDate>Thu, 27 May 2010 10:58:56 GMT</pubDate>
            <wfw:comment>http://geekswithblogs.net/mattjgilbert/comments/140103.aspx</wfw:comment>
            <comments>http://geekswithblogs.net/mattjgilbert/archive/2010/05/27/high-availability.aspx#feedback</comments>
            <wfw:commentRss>http://geekswithblogs.net/mattjgilbert/comments/commentRss/140103.aspx</wfw:commentRss>
        </item>
        <item>
            <title>Planning for BizTalk DR</title>
            <link>http://geekswithblogs.net/mattjgilbert/archive/2008/12/15/planning-for-biztalk-dr.aspx</link>
            <description>&lt;div style="MARGIN: 0cm 0cm 10pt"&gt;In my current company, we have a fairly good DR story with our “Global” (UK based) BizTalk platform and it’s something that is regularly tested (successfully I might add :) ). We also have a smaller BizTalk deployment in the US which follows a different model but which was also proven successful this year for the first time. More on that in a bit.&lt;/div&gt;
&lt;div style="MARGIN: 0cm 0cm 10pt"&gt; &lt;/div&gt;
&lt;div style="MARGIN: 0cm 0cm 10pt"&gt;The choice of DR model you adopt for BizTalk will depend on a number of factors:&lt;/div&gt;
&lt;div style="MARGIN: 0cm 0cm 10pt"&gt;&lt;strong&gt;Criticality&lt;/strong&gt; – how important is the information flowing through or orchestrated by BizTalk, either now or planned?&lt;/div&gt;
&lt;div style="MARGIN: 0cm 0cm 10pt"&gt;&lt;strong&gt;Recovery Point Objective&lt;/strong&gt; – Your RPO is dictated by how accurate/up to date your system needs to be when DR is invoked. Can you afford to lose the last 15 minutes of activity? Can you afford to lose nothing?&lt;/div&gt;
&lt;div style="MARGIN: 0cm 0cm 10pt"&gt;&lt;strong&gt;Recovery Time Objective&lt;/strong&gt; – Your RTO is dictated by the role BizTalk plays in your environment and is tied to Criticality and RPO. Do you need to be back on line in seconds, minutes, hours or days? Your figures for both RTO and RPO will be driven by the SLA you have with the Business for meeting your DR targets.&lt;/div&gt;
&lt;div style="MARGIN: 0cm 0cm 10pt"&gt;&lt;strong&gt;Cost&lt;/strong&gt; – are you paying for a mirror of production or a reduced capacity? Are your servers going to be on cold or hot standby? Do you have Software Assurance on your production licenses? Do you need to extend an existing DR contract to accommodate the BizTalk DR requirements?&lt;/div&gt;
&lt;div style="MARGIN: 0cm 0cm 10pt"&gt; &lt;/div&gt;
&lt;div style="MARGIN: 0cm 0cm 10pt"&gt;There are obviously more things to consider but these are some important ones to be starting with. Let’s take a look at each in a little more detail.&lt;/div&gt;
&lt;div style="MARGIN: 0cm 0cm 10pt"&gt; &lt;/div&gt;
&lt;div style="MARGIN: 0cm 0cm 10pt"&gt;&lt;strong&gt;&lt;u&gt;Criticality&lt;/u&gt;&lt;/strong&gt;&lt;/div&gt;
&lt;div style="MARGIN: 0cm 0cm 10pt"&gt;Most BizTalk implementations are at the heart of a business and therefore are usually flagged as a “Class A” system when it comes to planning for DR. What flows through BizTalk can be a mixed bag however, so you may want to consider, as part of the BizTalk team, internally classifying the various interfaces you are dealing with in terms of their priority. The chances are you will have done this already as part of the day-to-day support process (e.g. you know that if interface A breaks at 2am in the morning, people get up to fix it but if a message goes suspended on process B it can wait until the next working day). You can then apply the same thinking to your recovery planning if you need to phase when elements of your solution come back online.&lt;/div&gt;
&lt;div style="MARGIN: 0cm 0cm 10pt"&gt; &lt;/div&gt;
&lt;div style="MARGIN: 0cm 0cm 10pt"&gt;&lt;strong&gt;&lt;u&gt;RPO&lt;/u&gt;&lt;/strong&gt;&lt;/div&gt;
&lt;div style="MARGIN: 0cm 0cm 10pt"&gt;Your RPO represents the data point you are trying to get back to when you recover your systems. This is one of the key things that will force your hand when it comes to deciding on a DR solution. Some things that will dictate you have a very stringent RPO, getting as close to real-time as possible are:&lt;/div&gt;
&lt;div style="TEXT-INDENT: -18pt; MARGIN: 0cm 0cm 10pt 19.5pt"&gt;&lt;span&gt;-&lt;span style="FONT: 7pt 'Times New Roman'"&gt;&lt;font face="Arial"&gt;          &lt;/font&gt;&lt;/span&gt;&lt;/span&gt;a high-throughput of critical transactions which will cost the business money if they are lost&lt;/div&gt;
&lt;div style="TEXT-INDENT: -18pt; MARGIN: 0cm 0cm 10pt 19.5pt"&gt;&lt;span&gt;-&lt;span style="FONT: 7pt 'Times New Roman'"&gt;&lt;font face="Arial"&gt;          &lt;/font&gt;&lt;/span&gt;&lt;/span&gt;long running transactions and complex business processes where the loss of messages or recent activity makes it very hard to replay or resynchronise systems.&lt;/div&gt;
&lt;div style="MARGIN: 0cm 0cm 10pt 1.5pt"&gt;However, it might be that the processes you are orchestrating or messages you are routing are considered low priority or easily recovered and replayed. In which case, your target RPO might be minutes or even hours before DR was invoked (if at all).&lt;/div&gt;
&lt;div style="MARGIN: 0cm 0cm 10pt 1.5pt"&gt; &lt;/div&gt;
&lt;div style="MARGIN: 0cm 0cm 10pt 1.5pt"&gt;&lt;strong&gt;&lt;u&gt;RTO&lt;/u&gt;&lt;/strong&gt;&lt;/div&gt;
&lt;div style="MARGIN: 0cm 0cm 10pt 1.5pt"&gt;As mentioned already, RTO is going to be derived from the Criticality and RPO that has been agreed. It’s no good having an RTO of 2 hours if you have critical systems which require you to be back online in less than 5 minutes. Your RTO will help make the decision about the type of DR environment you need to establish and that will then have an effect on the costs involved.&lt;/div&gt;
&lt;div style="MARGIN: 0cm 0cm 10pt 1.5pt"&gt; &lt;/div&gt;
&lt;div style="MARGIN: 0cm 0cm 10pt 1.5pt"&gt;&lt;strong&gt;&lt;u&gt;Cost&lt;/u&gt;&lt;/strong&gt;&lt;/div&gt;
&lt;div style="MARGIN: 0cm 0cm 10pt 1.5pt"&gt;Obviously the more complex, near-production and zero data-loss you aim for in you DR solution, the more expensive it becomes. You also need to consider the licensing implications of having BizTalk servers (and SQL servers…and anything else) at a DR site. Production licenses purchased under Software Assurance get some breaks here as discussed in this &lt;font color="#800080"&gt;&lt;a target="_blank" href="http://download.microsoft.com/download/8/7/3/8733d036-92b0-4cb8-8912-3b6ab966b8b2/dr_brief.doc"&gt;document&lt;/a&gt;&lt;/font&gt; but otherwise more boxes can often mean more money, and not just in terms of hardware.&lt;/div&gt;
&lt;div style="MARGIN: 0cm 0cm 10pt 1.5pt"&gt;Low cost options (beyond not bothering with DR) include:&lt;/div&gt;
&lt;div style="TEXT-INDENT: -18pt; MARGIN: 0cm 0cm 10pt 19.5pt"&gt;&lt;span&gt;-&lt;span style="FONT: 7pt 'Times New Roman'"&gt;&lt;font face="Arial"&gt;          &lt;/font&gt;&lt;/span&gt;&lt;/span&gt;Having a cold-standby server ready to fire up when DR is invoked (possible license required)&lt;/div&gt;
&lt;div style="TEXT-INDENT: -18pt; MARGIN: 0cm 0cm 10pt 19.5pt"&gt;&lt;span&gt;-&lt;span style="FONT: 7pt 'Times New Roman'"&gt;&lt;font face="Arial"&gt;          &lt;/font&gt;&lt;/span&gt;&lt;/span&gt;Having hardware provided only and building the platform from scratch (obviously this impacts RTO a great deal)&lt;/div&gt;
&lt;div style="MARGIN: 0cm 0cm 10pt"&gt;Some high cost options would be:&lt;/div&gt;
&lt;div style="TEXT-INDENT: -18pt; MARGIN: 0cm 0cm 10pt 19.5pt"&gt;&lt;span&gt;-&lt;span style="FONT: 7pt 'Times New Roman'"&gt;&lt;font face="Arial"&gt;          &lt;/font&gt;&lt;/span&gt;&lt;/span&gt;Matching all your production environment server for server (with licenses)&lt;/div&gt;
&lt;div style="TEXT-INDENT: -18pt; MARGIN: 0cm 0cm 10pt 19.5pt"&gt;&lt;span&gt;-&lt;span style="FONT: 7pt 'Times New Roman'"&gt;&lt;font face="Arial"&gt;          &lt;/font&gt;&lt;/span&gt;&lt;/span&gt;Achieving a near real-time RPO with SAN Mirroring, cross-site clustering or similar&lt;/div&gt;
&lt;div style="MARGIN: 0cm 0cm 10pt"&gt; &lt;/div&gt;
&lt;div style="MARGIN: 0cm 0cm 10pt"&gt;&lt;strong&gt;Other considerations&lt;/strong&gt;&lt;/div&gt;
&lt;div style="MARGIN: 0cm 0cm 10pt"&gt;&lt;em&gt;Capacity&lt;/em&gt;: Does your DR offering need to be as performant as your production environment? Can you get away with lower grade hardware, less memory or less servers? Is the Disk sub-system on the DR SQL server going to be good enough for low latency or high volume requirements you have? If you have fewer servers, does this affect the way you distribute your hosts and workload? Has any known reduction in capability (should DR be invoked) been publicised so you aren’t being asked to meet expectations that are unrealistic?&lt;/div&gt;
&lt;div style="MARGIN: 0cm 0cm 10pt"&gt;&lt;em&gt;Peripheral components&lt;/em&gt;: Don’t forget those things you own and interact with that are part of your solution like custom databases, web sites/services and repositories. You need you DR story to include them to and they need factoring into you plans to achieve RTO, RPO and how you are going to perform in DR.&lt;/div&gt;
&lt;div style="MARGIN: 0cm 0cm 10pt"&gt;&lt;em&gt;Testing&lt;/em&gt;: Having a good DR story is all very well but if you don’t test it you cannot have any confidence it’s going to work! Regularly scheduled DR tests are a must and from experience, these need to be in a closed off network; if you are simulating your data-centre going off line – make sure your DR site isn’t talking to it!&lt;/div&gt;
&lt;div style="MARGIN: 0cm 0cm 10pt"&gt; &lt;/div&gt;
&lt;div style="MARGIN: 0cm 0cm 10pt"&gt;&lt;strong&gt;&lt;u&gt;So what are we doing then?&lt;/u&gt;&lt;/strong&gt;&lt;/div&gt;
&lt;div style="MARGIN: 0cm 0cm 10pt"&gt;We have two BizTalk platforms, one in the UK supporting the “Global” application and systems and a smaller one in the US which deals with local systems. For various historical reasons we have different DR solutions for each one.&lt;/div&gt;
&lt;div style="MARGIN: 0cm 0cm 10pt"&gt;Our UK DR solution consists of three servers (1 BTS, 1 SQL and 1 Comms (FTP, WebSphere MQ etc)). Compare this to the production environment which is 4 BizTalk servers, an active-active SQL Cluster and an active-passive comms cluster. The reduced capacity is a known and accepted (and deemed acceptable) solution. The servers are in a dormant state (ENTSSO is disabled on the BizTalk box) and when deployments are made to production they are replayed to DR a few days later (e.g. the Monday following a Saturday deployment). The bulk of what we do in BizTalk is message routing and transformation and we have no long running processes to try and maintain consistency on across a disaster. That means our solution is not a costly as it could have been otherwise. We are responsible for a large number of high volume critical interfaces however so our RTO is matched with theirs. If DR is invoked, we simply have to bring ENTSSO and the BizTalk services back online and we should be good to go. This has been tested yearly since we went live about 5 years ago and has always been successful. The big win for us is effectively not being set an RPO. The critical systems we deal with have a very generous 15 minute RPO due to their log shipping schedule and as most traffic passes through us in seconds, there is no point trying to match it.&lt;/div&gt;
&lt;div style="MARGIN: 0cm 0cm 10pt"&gt;Our US solution up until the last test had always failed. The RTO was in days and there was no dedicated hardware. The support team in the US used manual build scripts to rebuild the platform once hardware was provided to them. Due to poor documentation and complexity of deployments and 3&lt;sup&gt;rd&lt;/sup&gt; party tools this was never successful. This year we tried something different – VM snapshots of the BizTalk servers spun up on top of database backups. This worked :) The approach has obvious flaws though (you need to re-image the VM on every change to the BTS environment or reply any changes since the last time the VM was made and the DR invocation (which impacts the speed of recovery).&lt;/div&gt;
&lt;div style="MARGIN: 0cm 0cm 10pt"&gt; &lt;/div&gt;
&lt;div style="MARGIN: 0cm 0cm 10pt"&gt;So there is a lot to consider and every company is going to have different requirements but hopefully some of what I’ve detailed above will get you started along the road to good DR offering.&lt;/div&gt; &lt;img src="http://geekswithblogs.net/mattjgilbert/aggbug/127910.aspx" width="1" height="1" /&gt;</description>
            <dc:creator>mattjgilbert</dc:creator>
            <guid>http://geekswithblogs.net/mattjgilbert/archive/2008/12/15/planning-for-biztalk-dr.aspx</guid>
            <pubDate>Mon, 15 Dec 2008 22:01:41 GMT</pubDate>
            <wfw:comment>http://geekswithblogs.net/mattjgilbert/comments/127910.aspx</wfw:comment>
            <comments>http://geekswithblogs.net/mattjgilbert/archive/2008/12/15/planning-for-biztalk-dr.aspx#feedback</comments>
            <slash:comments>1</slash:comments>
            <wfw:commentRss>http://geekswithblogs.net/mattjgilbert/comments/commentRss/127910.aspx</wfw:commentRss>
        </item>
    </channel>
</rss>
