Michael Stephenson

keeping your feet on premise while your heads in the cloud
posts - 357 , comments - 422 , trackbacks - 11

My Links

News

View Michael Stephenson's profile on BizTalk Blog Doc View Michael Stephenson's profile on LinkedIn

Twitter












Archives

Post Categories

Image Galleries

BizTalk

Mates

Windows Azure Service Bus Relay – Certificate Trust Chain Verification Error

Recently we had a problem with our IIS Hosted WCF services which were listening to the Windows Azure Service Bus Relay. The symptoms were as follows:

  1. Only the production environment was affected
  2. All other environments were fine
  3. Both servers hosting the services went down at approximately the same time

When we tried to restart them we were getting a certificate trust verification error like in the below box.

WebHost failed to process a request. Sender Information: System.ServiceModel.ServiceHostingEnvironment+HostingManager/35320229 Exception:

System.ServiceModel.ServiceActivationException: The service '/MyIISApplication/MyWCFService.svc' cannot be activated due to an exception during compilation.

The exception message is: The X.509 certificate CN=servicebus.windows.net, OU=WindowsAzure, O=Microsoft, L=Redmond, S=WA, C=US chain building failed.

The certificate that was used has a trust chain that cannot be verified. Replace the certificate or change the certificateValidationMode.

The revocation function was unable to check revocation because the revocation server was offline.

. ---> System.ServiceModel.Security.SecurityNegotiationException: The X.509 certificate CN=servicebus.windows.net, OU=WindowsAzure, O=Microsoft, L=Redmond, S=WA, C=US chain building failed. The certificate that was used has a trust chain that cannot be verified. Replace the certificate or change the certificateValidationMode. The revocation function was unable to check revocation because the revocation server was offline.

---> System.IdentityModel.Tokens.SecurityTokenValidationException: The X.509 certificate CN=servicebus.windows.net, OU=WindowsAzure, O=Microsoft, L=Redmond, S=WA, C=US chain building failed. The certificate that was used has a trust chain that cannot be verified. Replace the certificate or change the certificateValidationMode. The revocation function was unable to check revocation because the revocation server was offline.

 

We had come across this previously in the test and development environments but it only ever happened very occasionally. Normally we had been able to clean the credential cache or restart the app pools and it had always just worked. We had also reviewed some of the other articles online about similar errors and possible fixes but none of them had ever seemed to work. Since the problem didn't really affect us in test/dev and always went away easily it had never been given too much airtime.

This week we had a bigger issue where the production service had been running fine for months but suddenly stopped. None of the old workarounds had really made any difference. Cutting through some of the diagnostics steps we had taken while troubleshooting to keep this article short, we had managed to change the user account running the app pool on one server and that server started working. On the other server the same steps didn't work.

We had production service restored but were unable to get it working with the expected configuration and were still getting the above error on one server.

At this point we had engaged with Microsoft support through our Azure support agreement. While working with one of their engineers we found that with netmon and also the CAPI 2.0 logging available via Event Viewer we could see that some of the certificates could not be verified and there were some errors. This corresponded with some information in our proxy server logs about some url's being blocked. The blocked url's were:

Our current configuration is as follows:

  • We have two servers which are listeners for Azure Service Bus Relay
  • Our firewall allows outbound connection from the 2 servers to the Azure datacentre over ports 80, 443, 9350-9354
  • We configure the proxy server for access to our Azure Service Bus namespace's ACS endpoint
  • We configure the proxy server for access to a couple of other url's which seem to be required, we used the ones which were out there in the general guidance online and also looked for any others which might be required during out early stages.

Our current proxy configuration was as follows:

  • Source IP
    • <IP address of the on-premise servers>
  • Destination
    • <My namespace>-sb.accesscontrol.windows.net
    • mscrl.microsoft.com
    • crl.microsoft.com
    • www.public-trust.com

From the CAPI log and the netmon trace we could see that there were issues accessing these certificate related resources which we assume would be updates to certs or revocation lists. We were seeing things like:

  1. HTTP 403 Forbidden error code
  2. Proxy returning error 'X-Squid-Error:  ERR_ACCESS_DENIED 0'. So the proxy is not allowing traffic to above URL.

In addition to our configuration above our WCF service which has been in production for a while has been using the 1.6 SDK. This has now been superseded by a few other releases. This hasn't really changed for a while but it hadn't needed to.

Based on the support call your experience with this error could be slightly different depending on what version of the SDK you are using. This is outlined below. 

   

Service Bus SDK 1.8 or Above

You should no longer get this issue because the SDK no longer checks for certificate revocation.

 

Service Bus SDK 1.7

This can be worked around by using the following snippet in the configuration file.

<configuration>

    <appSettings>

        <add key="Microsoft.ServiceBus.X509RevocationMode" value="NoCheck"/>    

    </appSettings>

</configuration>

You should probably still consider looking into your proxy server to check what is being blocked.

 

Service Bus SDK 1.6

In our case this was related to blocked addresses on our proxy server. We modified the proxy server settings to have the following as allowed on our Squid proxy.

<My Namespace Goes here>-sb.accesscontrol.windows.net

mscrl.microsoft.com

crl.microsoft.com

public-trust.com

verisign.com

windowsupdate.com

msftncsi.com

crl.omni root.com

Lessons Learnt

There are a couple of lessons we can take away from this.

  1. We need to set something up to report blocked addresses from our proxy server for these kind of situations. This had been working fine for ages and then this week the certificate network retrieval has been blocked and we need to know if this ever happens in the future before it affects service. In terms of our solution, when its configured correctly we don't expect any url's to be used which would be blocked as they should all relate to the solution and it seems we were not aware of all of them
  2. We need to agree a standard for updating the SDK. This component hadn't changed for months yet it is already 5 versions behind the latest SDK which is not 2.1

 

More Info

To findout more about using the CAPI 2.0 logging refer to http://www.entrust.net/knowledge-base/technote.cfm?tn=8165

 

Print | posted on Wednesday, July 17, 2013 12:14 PM | Filed Under [ Azure Service Bus ]

Feedback

No comments posted yet.
Post A Comment
Title:
Name:
Email:
Comment:
Verification:
 
 

Powered by: