Tuesday, January 11, 2005 11:13 PM
philipnet
When Windows Domain Controllers Attack!
Firstly, a big thank you to Adam, at Microsoft, whom I spent nearly two and a half hours on the phone to.
Today started at about 8:30 am, when I walked into the office. It ended at 18:30, with me feeling totally exhausted and knowing that whilst today was full of mind extending information, tomorrow would be mind numbing tedium being relegated from my position of the guy-they-call-upon-when-it-all-falls-to-pieces-who-is-also-an-I.T.-Technician to being a plain old I.T. Technician.
It began by being told that the majority of the staff couldn’t log on whilst those that had stayed logged from yesterday were OK - although they couldn’t then get their email. There were numerous errors in Active Directory about the Domain Controller and Operations Manager of a sub-domain being unable to replicate information to and from the parent domain; the other DC in the sub-domain also wasn’t happy. The DNS servers contained corrupt information and the time on my computer was out by about 5 or 6 minutes.
OK, so I know that the last item seems inconsequential but, if you have ever troubleshooted NDS and AD, you know that the having the same time on all the servers is very important when it comes to ensuring that information is being replicated.
As it stands all workstations are joined to the child where as most of our servers, and some older workstations, reside in the parent domain. There is a Transitive/two-way trust between the domains but, as it was broken, staff users of newer machines couldn’t log on. Using Active Director Domains and Trusts I verified on both sides of it that the trust was up and fully functional, so that wasn’t the issue. However, when forcibly trying to replicate the NTDS from within Active Directory Sites and Services, from the parent to the child domains, it failed saying that the DSA operation failed and claiming that it was a DNS problem. I thought that just adding the entry for the child DC into the DNS would solve it, but that wasn’t to be.
The morning was dragging on by this point, and now most of the staff were able to login and get their email – the system almost seemed to be auto-magically fixing itself. Additionally, a colleague had remembered that on a previous occasion when they had Active Directory problems, the time on the client had been important. Armed with that knowledge, and that moving a workstation to the parent domain could fix the problem, he tried to correct some of the remaining non-functional PCs. This was met with limited success, but at least it got a few more users on. (Incidentally, temporarily disconnecting the workstation from the network would allow the rest of the staff to login; although never heard back on just how far they could get into the system).
It’s now after lunch and ex-student, Adam, has been contacted by one of his old tutors and wants us to run MPSRPT_DirSvc/DirSvc from http://www.microsoft.com/downloads/details.aspx?FamilyID=cebf3c7c-7ca5-408f-88b7-f9c79b7306c0b on a DC in the parent domain. The report from that gets sent off, and then reply comes back that he wants the DirSvc output from the child domain and that there’s crud in the DNS servers/services and that we need to rebuild them. That process won’t take long, but I decided to wait until classes had started and had been running for 5-10 minutes.
Then a reply comes back that things are much more serious than that and that we need to transfer domain operations to a new machines, dc-un-promo the two child domain DCs and then re-add them. We start the process, only to get a call from Adam to say that we might not need to do that. Over the course of the following two and a half hours, Adam talks me through troubleshooting AD, rebuilding the DNS services, improving their configuration and that of all the DCs, whilst at the same time moaning about the idiot who set it up in the first place – that would be my colleagues then
.
It finally comes to half six and we have a properly configured DNS and DC servers with suitable entries in the DNS for the DCs, and that replication is proceeding smoothly between the parent and child domain. Tomorrow begins the process of documenting the changes I made (change control), which will almost be the first pieces of documentation made on our servers(!) and instigating some fine tuning and performance enhancing features. As well as ensuring that the DCs don’t get out of time sync!