Wednesday, 27 February 2008

Beware DTC configuration in a cluster

Suffered from a severe amount of head scratching today due to using Mutual Authentication in Microsoft DTC.

On a particular project of a clients, we have a bunch of applications and a bunch of services. These services use System.Transactions and nest a bunch of transactions together in a DAO layer - which all use the transaction scope option of required, so if a transaction's already started, it simply enlists in it.

Everything appeared to work fine - the clients were working, the services on the test server were working fine, connecting to our database cluster and everything seemed happy and ready for a dry run test into the live environment. When we moved the services into a cluster however, everything failed - with the ubiquitous "communication with the underlying transaction manager failed" error, so common when DTC isn't configured correctly.

After two days, the operations guys discovered (with a little help from Microsoft support) that the issue was actually because we were using mutual authentication configuration in MSDTC.

Mutual authentication works fine server to server and in all non-clustered environments, but where you have a cluster that needs to talk out to another server, or two clusters talking to each other, mutual authentication is a very bad thing, and armed with the knowledge of why, it's easy to understand in hindsight.

It boils down to how mutual authentication works. Normally, the mutual authentication conversation between two machines might go like this;

Machine A: "Hi, I'm Machine A, with IP address 192.168.01"
Machine B: "Are you really Machine A, with IP address"
Machine A: "Yeah!"
Machine B: "Cool"

And everything is happy, however, assuming we have two clusters as follows;

  • Cluster A (Database cluster - virtual IP
    • Server A.1 (IP
    • Server A.2 (IP
  • Cluster B (Services cluster - virtual IP
    • Server B.1 (IP
    • Server B.2 (IP

The conversation goes like this:

Cluster A: "Hi, I'm Cluster A, with IP address"
Cluster B: "Are you really Cluster A, with IP address"
Server A.1: "No, I'm actually Server A.1 with IP address"
Cluster B: "Then I suggest you leave before I call the boys in!"

Clustered servers don't confirm their identity as the cluster virtual name/ip, and instead validate as the current active node in the cluster. This makes mutual authentication fail.

So the moral of the story - if you're using clusters, you simply cannot use mutual authentication. It does actually say this in the recommended cluster configuration, but it's not very clear - is says "You cannot select mutual authentication" - with no explaination, which makes it sound like it's disabled, so when you realise it's not and it's available, you can easily and naturally turn it on thinking that would be the most secure configuration.

As a final note - we were told that in internal networks, the use of "No authentication" is actually recommended practice and that you don't need to use the higher authentication levels.