One of our .net application teams has had a problem for quite a while that related to impersonation and kerberos multi-hop delegation which had proven quite difficult to resolve. We eventually resolved this and I thought it would be worth popping a little bit of information about it out there incase anyone else has similar problems.
We had two web services with 2 methods which participate in a Kerberos multi-hop delegation scenario using WSE 2. One of the methods works fine all of the time and the other method intermittently was having problems.
The symptoms we experienced were:
- For method 1 everything worked fine all of the time
- Everything appeared to be working fine for method 2 but after a period of time it would start getting errors which indicated the Kerberos token had expired
- Method 2 also looked occasionally like the wrong user identity had been flown over the wire to the back end service
- The calls to method 2 which stopped working after a period of time seemed to coincide with the expiry of the Kerberos ticket (around a week to 10 days can't remember the exact time frame but it coincided with the cache duration of a Kerberos ticket)
- After an IIS reset things worked fine straight away for a period of time and then would stop working after the time period above
- The error message we were getting was: The Kerberos credential handle could not be acquired. The AcquireCredentialsHandle call returned the following error code: A specified logon session does not exist. It may already have been terminated.
This was very painful to troubleshoot, the .net team were struggling to recreate this problem in any of their test environments. Through the addition of some instrumentation they were able to derive what was going on.
What was happening
When the development team originally coded web service A for method 2 they had implemented a singleton or caching pattern of the proxy object for web service 2. This meant they did not need to construct a new instance each time.
This meant that when the method 2 was called then the WSE policy was applied and the Kerberos token used etc but this object must internally be handling things like WS-Conversation and some kind of caching so that the Kerberos token can be reused over the lifetime of that proxy instance. In essence the first user to create the object's identity was used for every subsequent usage of this object when applying a Kerberos token to the web service calls made with this proxy.
Also this would explain why after a period of time the calls fail because the Kerberos ticket obtained for that user would have expired.
The solution was to change the code in web service A so that the proxy to web service B was not cached or a singleton and so that a new instance of this proxy was created each time.
Would this apply to WCF
At this stage I have not tested if WCF would get the same problem but I do know you run into issues in various configurations if you haven't disposed of a proxy class correctly so generally it has become a very common practice for developers to safely dispose of their WCF proxies so this tends to be less of an issue. In WSE 2 or 3 developers are not used to having to use a dispose pattern around web service proxies
Lessons to learn
Some take away's for this include:
- It is a lot more likely to be your own code than framework code causing obscure problems like this
- Impersonation can be a dangerous thing so ensure you test and code review any usage of it
Credit goes to Matt Buckley who found the needle in the stack of needles in the custom .net application and found the cached proxy instance through extensive analysis of the custom code.