Today we're going to cover an issue that is more common than you might think, but is one that may be tricky to troubleshoot. The first thing that you may think when you read the title of this post is, "What is the Performance team doing trying to figure out why the Cluster Service is acting up?" This issue usually winds up with our Cluster team, but because of the root cause, we do a lot of collaboration with them on this. Intrigued? Good, read on ...
In this particular scenario, the problem presented as follows: An active-passive 2-node Windows Server 2003 print cluster failed over for some reason. When digging into what was going on, Node 1 of the cluster had lost communication with the rest of the cluster. However, it had not actually dropped its network connection. Looking at the state of the cluster service itself, the service was stuck in a "Starting" state on Node 1. The problem with being stuck in this state is that we could not really do very much with the service when it is in this condition. Even when we rebooted the server (with the service startup type still set to Automatic), the cluster service came up and got hung in a "Starting" state. Meanwhile, Node 2 of the cluster was chugging along happily, servicing print requests and behaving itself - but your high availability print environment was one bad spooler crash away from an administrator's worst nightmare ...
|