Hey Rob here again, I thought that I would share with you some of the things that we see where Internet Explorer Kerberos authentication fails.
It is important to understand the default behavior of Internet Explorer and its support for Kerberos authentication so that you don’t start ripping out your hair (can’t speak to what Ned does here). I have listed three very common problems that we typically see when Kerberos authentication is failing with web-based applications.
Scenario 1: Website does not use the standard TCP/IP port of (80/443)
Web Server Configuration:
Webserver1 has two different IIS sites running on it.
Website1 runs with a web application pool account assigned to NetworkService.
Website2 runs with a web application pool account assigned to a domain user account.
The website2 is configured to listen on Port 8080.
The following SPN’s have been defined on the website2 application pool account.
http/webserver1.contoso.com:8080
http/webserver1:8080
In this scenario you can see why a non-standard port is being used since multiple websites have been configured on the same web server. When this happens you need to specify the port to be used when you add the Service Principal Name, otherwise there is going to be a high likely hood that you will get a Kerberos ticket for the wrong web application pool account.
Client Workaround
In this scenario you need to make sure that when Internet Explorer accesses Website2 that it asks for a Service Principal name with the port number defined. However, the default behavior of IE is to not add the port number to the Kerberos ticket request. When this ticket is presented to IIS you will see a KRB_AP_ERR_MODIFIED message back.
You will need to use the below KB article change the default behavior on all IE client versions. For Internet Explorer 6 it will require the QFE Brach of Wininet.dll to be installed before the registry change will actually work.
I can tell you that there is not a version of this KB article for IE7 and above, but you do have to make the same registry change for these versions of IE also.
Server Workaround:
There really is not a good workaround to the issue other than to use host headers for one of the websites and adding a DNS HOST record for the host header in your DNS configuration. You will see shortly why we are not recommending a CNAME record in DNS.
Scenario 2: CNAME DNS RR is used for a website.
Web Server Configuration:
Webserver1 has two different sites running on it.
Website1 runs with a web application pool account assigned to NetworkService.
Website2 runs with a web application pool account assigned to a domain user account.
Website2 is configured to us a host header of app1.contoso.com.
In DNS a CNAME record for app1.contoso.com was created and pointed to webserver1.contoso.com HOST record.
The following SPN’s have been defined on the website2 application pool account.
http/app1.contoso.com
http/app1
In this scenario it appears that this should work just fine. When a user goes to app1.contoso.com the client machine is going to do a DNS lookup, and the DNS server is going to respond with the CNAME record and point to the webserver1.contoso.com HOST record. We can also see that the Service Principal Name configuration is properly configured on the web application pool account for website2.
The default behavior of Internet Explorer is to generate the Kerberos ticket request for the HOST record that is returned from a CNAME record, not the actual CNAME record itself. So IE specifically asks for a Kerberos ticket for http/webserver1.contoso.com which will result in a Kerberos ticket being generated encrypted with the WebServer1 computer’s password. This will in turn generate a KRB_AP_ERR_MODIFIED from IIS back to IE when the user attempts to visit the app1.contoso.com website.
Client Workaround
You will need to use the KB articles below to change the default behavior on all IE versions. For IE 6 it will require the QFE Brach of Wininet.dll to be installed before the registry key change will actually work.
For Internet Explorer 6:
911149 Error message in Internet Explorer when you try to access a Web site that requires Kerberos authentication on a Windows XP-based computer: "HTTP Error 401 - Unauthorized: Access is denied due to invalid credentials" - http://support.microsoft.com/default.aspx?scid=kb;EN-US;911149
The only work around is to remove the DNS CNAME RR and replace it with a HOST RR.
Scenario 3: Website works on first access but starts failing 30 minutes later
Web Server Configuration:
Computer Webserver1 one site on it.
Website1 runs with a web application pool account assigned to a domain user account.
The following SPN’s have been defined on the website1 application pool account.
http/webserver1.contoso.com
When a user visits the website they use the NETBIOS computer name for the URL to visit. For example: http://webserver1.
In this scenario there does not seem to be much wrong here, except that there is only the FQDN version of the Service Principal Name defined. Yeah, I know all of our documentation around Kerberos always states to add FQDN as well as NETBIOS name versions of the SPN. Believe it or not, we see this all the time where the user did not register both of them, but stick with me here.
The default behavior of Internet Explorer is to add on the computer’s DNS suffix or use DNS suffix search order if defined on the machine to whatever the user types in the URL if it is not a dotted name. If your DNS configuration is correct it will resolve to webserver1.contoso.com. Once IE finds this name it stores the DNS entry in its own DNS cache. Just like most caches it times out - for IE’s cache it is 30 minutes. After 30 minutes IE again has to resolve the name, however the next time it does not try to resolve the name through DNS again, it tries just NetBIOS name resolution (hopefully there is WINS in the environment; otherwise it will just fail). Based on your configuration you could expect one of the following Kerberos errors:
KRB_AP_ERR_MODIFIED– Expect to get this error if web site name is the same name as your web server’s computer name. That is because it is going to ask for an http/webserver1 SPN and will resolve to HOST/webserver1 which is assigned to the computer account.
KRB_ERR_S_PRINCIPAL_UNKNOWN – Expect to get this error if the web site name is something like app1.contoso.com. That is because it is going to ask for an http/app1 SPN and will not resolve to any account in the domain.
Client Workaround:
Thankfully there is a fix that can be implemented for Internet Explorer.
I will tell you that there is not a version of this KB article for IE7 and above, but you do have to make the registry key change for these versions of IE also before the functionality is supported.
Server Workaround:
You can register the NetBIOS version of the Service Principal Name to the account, using SETSPN.EXE.
I hope that you found this post interesting. As always it is easier to spot these type of issues by reviewing network trace taken at the client side (where IE is being used) to find the root cause of the issue.
Ned here again. Are you using MS Dynamics CRM? Be sure to check this excellent blog post from our colleagues Jeremy Morlock and Henning Petersen on how CRM uses Service Principal Names and what you need to get it all working:
Hi folks, Ned here again. This week we hunt down some documentation gremlins and give them a well-deserved smack.
Also, things will be a bit slow next week as I will be out in Redmond teaching this rotation of Microsoft Certified Masters. Never heard of it? If you’re at the IT career tipping point, this may be just what the doctor ordered. No really, it is, and I will be there!
What exactly does the dcdiag.exe /fix command do? According to this it fixes the SPNs on the DC machine account. But according to this it ensures that SRV records are appropriately registered (I thought the NetLogon service did this?!). And what exactly does the netdiag.exe /fix command do? This article says it "fixes minor problems", whatever that means.
Answer
1. Dcdiag /fix writes back the computers account’s AD replication SPN (DRSUAPI with an index value of “E3514235-4B06-11D1-AB04-00C04FC2DCD2”) entry only. More info on this SPN here:
If someone (else!) has destroyed all the other SPN’s, you will need to recreate them or restart whichever service recreates them. For example if the DFSR SPN goes missing, you restart the DFSR service and it will get put back.
2. Netdiag /fix reads the %systemroot%\system32\config\Netlogon.dns file and attempts to register all records in DNS.
I confirmed both in source code, regardless of what old TechNet goo states. :-)
Question
In Win2008 DFSR has been improved regarding the asynchronous RPC connections and 16 concurrent connections for upload and download. Do you have any further info on how improved the performance will be from Win2003 R2 to Win2008/2008 R2? Are there any other factors that would drive me to start rolling out the later OS versions?
Answer
I will be postingposted some new info about performance improvements in 2008/2008 R2 as well as registry tuning options in the coming weeks. But we don’t have any specific case studies that I am aware of yet – I’ll see if I can find them, and if you do, feel free to comment. We do have some rather unspecific ones, if you’reinterested.
From testing and customer experience though, we see anywhere from a 4 to 20 times performance improvement of 2008 over 2003 R2, depending on a variety of factors that are often very customer specific (network speed, bandwidth, latency, loss rates, errors, overall uptime + memory + CPU + disk subsystem + drivers). Not only did DFSR improve, but the OS got improvements and it makes better use of newer hardware. Besides the RPC and other changes, Win2008 tweaks the DFSR credit manager, and 2008 R2 really improves it – much more evenly-distributed replication with greatly lowered chance of servers being starved by updates.
Other factors:
Win2003 enters extended support on July 13 2010. This means no further hotfixes that improve reliability or performance, and 5 years of crossed fingers until end of life.
You would now have the option on DC’s to switch to DFSR-enabled SYSVOL and no longer use FRS there.
If deploying 2008 R2, you would also gain read-only and cluster support, which is unavailable in 2003/2008.
Question
I am using your old blog post on making custom registry changes and…
Answer
Ewwwww… The only reason to use that old document is if you are still running Windows 2000 somewhere. Otherwise you should be busting out Group Policy Preferences and wowing your friends and family.
Oh, and really? You’re running Win2000? That’s very uncool of you…
Question
I am doing USMT migrations with /SF. What is that switch and why are my migrations absolutely busted to heck?
Answer
This one came in late last week and was so gnarly that it ended generating a whole blog post. Read more here. Sometimes your questions to us generate more than a Friday reply.
Hello folks, Ned here again. After a week in Las Colinas Texas, the blog migration, and Jonathan’s attempted coup, we are still standing. Since I’m sure your whole day has been designed around this post I won’t keep you waiting.
I am testing RODC’s in a WAN scenario, where the RODC is in a branch site. When the WAN is taken offline, some users cannot logon even when I have cached their passwords. Other users can logon but not access other resources using Kerberos authorization, like file shares and what not.
Answer
Make sure that the computers in that branch site are allowed to cache their passwords also. This means that those computers need to be added into the Password Replication Policy allow list via DSA.MSC. For example:
If a user tries to logon to a computer that cannot itself create a secure channel and logon to a DC, that user will receive the error “The trust relationship between this workstation and the primary domain failed”.
If users can logon to their local computers, but then try to access other resources requiring a Kerberos ticket granting service ticket for those computers, and those computers are not able to logon to the domain, users will see something like:
The error “The system detected a possible attempt to compromise security” is the key, the dialog may change – in this case I was trying to connect to a share.
You will also see “KDC_ERR_SVC_UNAVAILABLE” errors in your network captures from the RODC. Here I am using a workstation called 7-04-x86-u to try and browse the shares on a file server called 2008r2-06-fn (which is IP address 10.70.0.106). My RODC 2008r2-04-f has a KDC that keeps getting TGS requests that it cannot fulfill since that 06 server cannot logon. So now you see all the SMB (i.e. CIFS) related TGS issues below:
Does DFSR talk to the PDCE Emulator like DFS Namespace root servers?
Answer
Nope, it locates DC’s just like your computer does when you logon – through the DC Locator process. So if everything is working correctly, any DC’s in the same site are the primary candidates for LDAP communication.
Question
I understand that DFSR uses encrypted RPC to communicate, but the details are kind of lacking. Especially around what specific cipher suite is used. Can you explain a bit more?
Answer
DFSR uses RPC_C_AUTHN_GSS_NEGOTIATE with Kerberos required, with Mutual Auth required, and with Impersonation blocked. The actual encryption algorithm depends on the OS’s supported algorithms used by Kerberos. On Windows 2003 that would be AES 128 (and RC4 or DES technically, but that would never be used normally). On Win2008 and Win2008R2 it would be AES-256. DFSR doesn’t really care what the encryption is, he just trusts Kerberos to take care of it all within RPC (and this means that you can replace “DFSR” here with “Pretty much any Windows RPC application, as long as it uses Negotiate with Kerberos”). Both AES 128 and AES 256 are very strong block cipher suites that meet FIPS compliance and no one is close to breaking them in the foreseeable future.
Not really an AD thing, but is Windows 7 able to use the Novell IPX network protocol?
Answer
Nope. Windows XP/2003 were the last Microsoft operating systems to include IPX support. Novell stopped including IPX when they released their client for Vista/2008:
Novell Client for Windows XP/2003 Features Not Included in the Novell Client for Windows Vista/2008
IPX/SPXTM protocols and API libraries.
Question
What settings should I configure for Windows Security Auditing? What’s recommended?
Answer
That’s a biiiiig question and it doesn’t have a simple answer. The most important thing to consider when configuring auditing – and the one that hardly anyone ever asks – is “what are you trying to accomplish?” Just turning on a bunch of auditing is wrong. Just turning on one set of auditing you find on the internet, a government website, or through some supposed “security auditing” company is also wrong – there is no one size fits all answer, and anyone that says there is can be discarded.
Decide what type of information you want to gain by collecting audit events – what are you going to do with this audit data.
Consider the resources that you have available for collecting and reviewing an audit log – not just cost of deployment, but reviewing, acting upon it, etc. Operational costs.
Collect and archive the logs using something like ACS. The forensic trail is very short in the event log alone.
Don’t just turn on auditing without having a plan for those three points. Start by reviewing our auditing best practices guide. Then review Eric Fitzgerald’s excellent blog post “Keeping the noise down in your security log.” It has one of the best points ever written about auditing:
“5. Don't enable "failure" auditing, unless you have a plan on what to do when you see one (that doesn't involve emailing me ;-) and you are actually spending time on a regular basis following up on these events.
You might or might not realize, that auditing in general is a potential denial-of-service attack on the system. Auditing consumes system resources (CPU & disk i/o and disk space) to record system and user activity. Success auditing records activity of authenticated users performing actions which they've been authorized to perform. This somewhat limits the attack, since you know who they are, and you've allowed them to do whatever it is that you're auditing. If they try to abuse the system by opening the audited file a million times, you can go fire them.
Failure auditing allows unauthenticated or unauthorized users to consume resources. In the worst case, a logon failure event, a remote user with no credentials can cause consumption of system resources.”
Make sure you are not impacting performance with your auditing – another good Eric read here. Understand exactly what it is your auditing will tell you by reviewing:
Finally, for some general sample template security settings, take a look at the Security Compliance Manager tool.
There must have been something in the water this week, as I got asked this by a dozen different customers, askds readers, and MS internal folks. Weird.
Question
When running AD PowerShell cmdlet get-adcomputer -properties * it always returns:
Get-ADComputer : One or more properties are invalid. Parameter name: msDS-HostServiceAccount At line:1 char:15 + Get-ADComputer <<<< srv1 -Properties * + CategoryInfo : InvalidArgument: (srv1:ADComputer) [Get-ADComputer], ArgumentException + FullyQualifiedErrorId : One or more properties are invalid. Parameter name: msDS-HostServiceAccount,Microsoft.ActiveDirectory.Management.Commands.GetADComputer
Not using –properties * or using other cmdlet’s worked fine.
Answer
Rats! Well, this is not by design or desirable. If you are seeing this issue then you are probably using the add-on "AD Management Gateway" PowerShell service on your Win2003 and Win2008 DC's, and have not yet deployed Windows Server 2008 R2 DC’s yet. You don’t have to roll out Win2008 R2, but you do need to update the AD schema to version 47 – i.e. Windows Server 2008 R2. Steps here, and as always, test your forest schema upgrade in your lab environment first.
Howdy partners, Ned here. This week we talk event logs, auditing, NTLM “fallback”, file server monitoring, and SCOM 2007 management pack dissection. It was a fairly quiet week for questions since everyone is off for vacation at this point, I reckon. That didn't mean it wasn't crazy at work - our folks take vacation too, and that leaves fewer of us to handle the cases. Hopefully you weren't on hold too long...
Oh, and it’s my fifth anniversary as an employee at Microsoft today. So being from the Midwest and not wanting to do the usual Microsoft M&M cliché, I brought 5 pounds of delicious Hickory Farms meat. It disappeared fast, people here are animals. Sausage-loving animals.
This does not help you on age, but if you are archiving the log every time it fills you get the same effect. Obviously you would need to start backing up all these event logs and deleting them or you would risk running out of disk space. And what about Windows Server 2003, you ask? We have a registry key there that will do the same thing – see the AutoBackupLogFiles value buried in KB312571.
Rather than going this route though, I instead suggest deploying some kind of security event collection tool, like System Center 2007’s free ACS component or a third party. It will scale much better and be less of a hassle to maintain. Then you are always intercepting and collecting your security events. Hopefully you have a plan to do something with them!
Question
<A conversation about why you should not skew clocks as that makes Kerberos break, as everyone knows. But then:>
However the vast majority of app servers should “just work” with NTLM fallback when Kerberos doesn’t work, correct?
Answer
Not necessarily! When MS started implementing Kerberos eleven years ago, NTLM was being replaced as the preferred security protocol. However, we knew that a million apps and down-level or 3rd party clients would not be able to handle Kerberos through negotiation. In order to make the experience less painful, we decided that when using the Windows Negotiate security package, we’d allow applications to first try Kerberos and if that failed, then try NTLM. Pretty much any failure was ok, such as the target server not supporting Kerberos or Kerberos being possible but malfunctioning due to environmental problems. If you simply asked for Kerberos only or NTLM only, there was no fallback because you were being specific. Some languages also provide for blocking fallback post negotiation, such as WCF’s ALLOWNTLM=FALSE flag. So NTLM fallback was never guaranteed or even tried in many scenarios. There are a lot of misunderstandings and mythology about this out there, but this is how it works - when it comes to your specific app, just test it under a network capture to see how it behaves.
Then starting Windows Vista SP1 and Windows Server 2008 we made a significant change – from then on, interactive logon stopped allowing NTLM fallback if Kerberos had errors. So for example, if someone duplicated a DC’s SPN, the user cannot logon (with error “The security database on the server does not have a computer account for this workstation trust relationship”) and examining their event log would show KDC 11 error and you'd see 4625 events on the DC security log. So if Kerberos was supposed to work and didn’t, too bad - no more fallback. Obviously that is also in place in Windows 7 and Win2008 R2, and for the foreseeable future.
Furthermore, in Windows 7 and Windows Server 2008 R2 we added a new extension to the negotiate security package to start making fallback less likely everywhere, not just in interactive logon. That is called negoexts and does stuff like federation support – from the beginning it has no concept of fallback at all.
So why change all this? Because it’s more secure. Better to prevent auth rather than allow someone to somehow damage Kerberos then use that opportunity to go through a weaker protocol.
Question
I would like to start examining the File Services management pack and other MP for System Center Operations Manager 2007. I don’t always find complete documentation on what these packs do (or I find this). I’d also rather not download and configure 1.25GB of SCOM trial edition just yet either.
2. Install the management packs you are interested in (such as File Services MP).
3. Start the Authoring Console and load your management pack. Generally, the “Library” MP will contain the majority of info – that’s why it’s bigger than the other files. For our File Services example:
7. If you click Edit Monitor, then the Product Knowledge tab, you can see how the monitor works, what the known causes are of the issue, what the resolutions are, and more info to take in. This is the part that makes you smart.
This works anywhere; you don’t even need to install a server – I am doing this all on my Win7 client.
And what this really highlights is just how important using these monitors are. The resolution sections are written by the Product Group to tell you the appropriate way to fix things and in many cases are also vetted and expanded on by MS Support. I spent a hellish few weeks going through the File Services one for example: 200 pages of spec, arrrrghhhh! Rather than relying on some uninformed stranger on the Internet you can instead get the official answer to each problem that SCOM finds, and it can even react on your behalf. It’s slick stuff.
Finally
I learned an important lesson today. When you are in a team meeting and you describe some broken process as “a real goat rodeo”, your colleagues will use the opportunity to remind you how short you are using terrible artwork:
Hello folks, Ned here again with another ridiculously overdue Friday Mail Sack. This week we talk about patching, admin rights, Kerberos, hiring, ADMT, and PKI. Next week we talk about… nothing. I will be out celebrating an Important Wife Birthday™ and unless Jonathan takes pity on you, there will be crickets. So bother him A LOT for me, would you?
What are the best practices for installing security updates on Domain Controllers? I always transfer the FSMO roles before rebooting any DC, is it correct, wrong? Is there anything else I should monitor or do before or after the restarts?
Answer
There’s no requirement that you move the FSMO roles as none of them need to be online for general domain functionality in the short term; heck, I had one customer with a PDCE offline for more than a year before they noticed – nice! Even if something awful happens and the DC doesn’t immediately come online, most of the FSMO roles serve no immediate purpose (like Schema Master and Domain Naming Master) or are used in only a periodic/failsafe role (RID, PDCE, Infrastructure) where a few minutes won't matter.
The important things are pretty common sense, but I’ll repeat them:
1. Make sure not all DC’s are rebooted at once; stagger them out a bit. 2. Make sure clients are not pointing to DC’s acting as DNS servers that are all being rebooted at once. 3. Make sure you are using a patching system so you don’t miss DC’s; these include WSUS, SCCM, or a third party. 4. Do it all off hours to minimize service interruption and maximize recover time if a DC doesn’t want to come back up!
Question
What group do I use to install security updates on DC’s and member servers if I do not want those users being Administrators.
Answer
It’s called “Power Unicorn Operators”.
:-D
No such group. Non-admins cannot install patches and security updates, and this is very much by design. If they could, they could also uninstall them – making a system unsecure. If they could, they could also install malware masquerading as patches and security updates, compromising a system. Use WSUS (free), SCCM ($), Automatic Updates with “download and install” (free), or a third party ($). Installing updates by hand is only going to work for admins, but even then it’s a poor management solution. Just ask all my Conficker-infected customers that were using that methodology…
I suppose it is an authentication issue (Kerberos?), but I cannot prove it – am I right?
Answer
You are correct, it is Kerberos. :-)
When a domain-joined client starts up and talks to an AD DC, it must use Kerberos as NTLM is not allowed for computer-to-computer communication. When given a network resource, it needs to be able to pass that host and service info off to the KDC to request a TGS ticket. For that to work, you have to be able to take that computer/service info and use it to find a Service Principal Name, and that starts with a computer or user principal.
So when you give it an IP address, there is no way to get a SPN, and therefore no way to get Kerberos. So it fails, expectedly and by design. You need to use FQDN (or if you must, NetBIOS name). You will see all this in a network capture as well.
Question (Continued)
The key was “…NTLM is not allowed for computer-to-computer communication...”
That really makes sense now :-). But staring at a network trace, captured during XP startup, I noticed the PC is looking for SPN CIFS/10.20.30.40 (when I used DC’s IP address in the startup script path). I was tempted and I added this SPN in the ServicePrincipalName attribute of DC’s object in the lab. After restarting both machines – the startup script run even with DCs IP in the file path (i.e. \\10.20.30.40\sysvol\netlogon\script1.cmd ).
Sounds logical, but is it practical? I suppose this is one of the “do not ever do this!” things? What would be the impact (security/design) if I add SPN like this?
Answer
Oh you sneaky engineers in the field, always clever and always hacking. :-)
Possible, yes. Practical, no. For a few reasons:
1. The computer will not self-maintain that SPN, unlike the other SPN’s. 2. This means you will need to maintain this on all SPN’s for all file servers. 3. It also means you need to remember to change this when IP addresses change, or serious confusion will ensue. 4. It also means all IT staff will need to know this, since you will not be there forever and you may like taking vacation from time to time. 5. It also means that if anyone forgets any of this, huge numbers of computers will not be getting policy/scripts and unless you are monitoring all client event logs, you won’t know it. 6. Update Jan 21 2011: and starting in Vista, it won't work at all!
So all that adds up to not recommended, leaning towards highly discouraged. Not to mention that pointing to a specific server isn’t needed when using DFSN (such as with SYSVOL). This will work perfectly well and guarantee the computer talks to the nearest DC first, then continue to work if that DC is down:
Naughty naughty, you did not read the requirements. You are trying to install this on a Windows Server 2008 or Windows Server 2003 computer running x86 (32-bit). ADMT 3.2 only installs on Windows Server 2008 R2. Since that can only be x64, the installer was only compiled in 64-bit. When you run x64 on x86, you naturally get that error.
If you tried to install this on Win2003/2008 X64, it would instead say that it requires Win2008 R2.
Question
I’ve not seen the Weekly DS KB articles from the AD Team blog for a while…. Is it because there aren't any? Or are you just no longer providing those?
Answer
No, Craig just got a bit behind. He plans to resume that soon. Soon being sometime between now and the zombie apocalypse.
Holy crap, do you believe we put pictures like that in Office Clipart?! We must give kids nightmares.
Question
Is constrained delegation between different domains (with trusted relationship) ever going to be supported? Maybe Windows Server 2014, Windows Server 2020, Windows server 2096, etc. ;-)
Answer
While I cannot speak about future releases, this is definitely something we get asked about all the time. When you ask us over and over for something, that helps make it more likely - not guaranteed, mind you - to happen. So if you have a Premier contract, whale on your TAM and let them know you need this functionality and why. The more compelling the argument and the more often it is made, the more likely to get examined for a future release. This goes for pretty much everything in Windows.
Question
We just created a new Win2008 R2 PKI (one Root CA and one Issuing Sub CA). We have two domains, so we placed the CA’s in our child domain, as we have an empty forest root domain. Should we have placed those CA’s in the empty root?
Answer
[This answer courtesy of Rob Greene – Ned]
I would recommend that you put the CA’s in the domain where the largest amount of certificate requests are going to be generated. I say this this because if you configure your certificate templates to publish the certificate in AD, then the CA computer will contact a local domain controller to get it added to the domain. Less traffic, less hopping, generally more efficient.
The other thing I would recommend is to add the CA’s computer account to the Cert Publishers group in both the child and root domains. This allows the CA to publish certificates for users / computers in both domains.
Question
I heard you are hiring, what are some good things to study up on if I want to interview and really rock your face off?
Answer
Start below. These are the core technologies - mainly as represented in XP and 2003 – that every DS Support Engineer has to know inside and out to be worth a darn in MS Support. Once you have those down you can find the Vista/08/7/R2 differences on your own.
Note: you can use these free trial editions below in order to do live repros of all this, and repros are highly suggested. Especially with the use of Netmon 3.4 to see how things look on the wire. Running these in Hyper-V, in Virtualbox, etc. will make the materials more understandable.
Next time I’ll give some links to the post-graduate level studying. Most people think they know these above really well… then the hyperventilating starts in the interview.
Heya, Ned here again. Since this another of those catch up mail sacks, there’s plenty of interesting stuff to discuss. Today we talk NSPI, DFSR, USMT, NT 4.0 (!!!), Win2008/R2 AD upgrades, Black Hat 2010, and Irish people who live on icebergs.
A vendor told me that I need to follow KB2019948 to raise the number of “NSPI max sessions per user” from 50 to 10,000 for their product to work. Am I setting myself up for failure?
Answer
Starting in Windows Server 2008 global catalogs are limited to 50 concurrent NSPI connections per user from messaging applications. That is because previous experience with letting apps use unlimited connections has been unpleasant. :) So when your vendor tells you to do this, they are putting you in the position where your DC’s will be allocating a huge number of memory pages to handle what amounts to a denial of service attack caused by a poorly written app that does not know how to re-use sessions correctly.
We wrote an article you can use to confirm this is your issue (BlackBerry Enterprise Server currently does this and yikes, Outlook 2007 did at some point too! There are probably others):
The real answer is to fix the calling application so that it doesn’t behave this way. As a grotesque bandage, you can use the registry change on your GC’s. Make sure these DC’s are x64 OS and not memory bound before you start, as it’s likely to hurt. Try raising the value in increments before going to something astronomical like 10,000 – it may be that significantly fewer are needed per user and the vendor was pulling that number out of their butt. It’s not like they will be the ones on the phone with you all night when the DC tanks, right?
Question
I have recently started deploying Windows Server 2008 R2 as part of a large DFSR infrastructure. When I use the DFS Management (DFSMGMT.MSC) snap-in on the old Win2008 and Win2003 servers to examine my RG’s, the new RG’s don’t show up. Even when I select “Add replication groups to display” and hit the “Show replication groups” button I don’t see the new RG’s. What’s up?
Answer
We have had some changes in the DFSMGMT snap-in that intentionally lead to behaviors like these. For example:
See the difference? The missing RG names gives a clue. :)
This is because the msDFSR-Version attribute on the RG gets set to “3.0” when creating an RG with clustered memberships or an RG containing read-only memberships. Since a Win2003 or Win2008 server cannot correctly manage those new model RG’s, their snap-in is not allowed to see it.
In both cases this is only at creation time; if you go back later and do stuff with cluster or RO, then the version may not necessarily be updated and you can end up with 2003/2008 seeing stuff they cannot manage. For that reason I recommend you avoid managing DFSR with anything but the latest DFSMGMT.MSC. The snap-ins just can’t really coexist effectively. There’s never likely to be a backport because – why bother? The only way to have the problem is to already have the solution.
Question
Is there a way with USMT 4.0 to take a bunch of files scattered around the computer and put them into one central destination folder during loadstate? For example, PST files?
Answer
Sure thing, USMT supports a concept called “rerouting” that relies on an XML element called “locationModify”. Here’s an example:
<migration urlid="<a href="http://www.microsoft.com/migration/1.0/migxmlext/pstconsolidate"> <component type="Documents" context="System"> <displayName>All .pst files to a single folder</displayName> <role role="Data"> <rules> <include> <objectSet> <script>MigXmlHelper.GenerateDrivePatterns ("* [*.pst]", "Fixed")</script> </objectSet> </include> <!-- Migrates all the .pst files in the store to the C:\PSTFiles folder during LoadState --> <locationModify script="MigXmlHelper.Move('C:\PSTFiles')"> <objectSet> <script>MigXmlHelper.GenerateDrivePatterns ("* [*.pst]", "Fixed")</script> </objectSet> </locationModify> </rules> </role> </component> </migration>
The <locationModify> element allows you to choose from the MigXmlHelpers of RelativeMove, Move, and ExactMove. Move is typically the best option as it just preserves the old source folder structure under the new parent folder to which you redirected . ExactMove is less desirable as it will flatten out the source directory structure, which means you then need to explore the <merge> element and decide how you want to handle conflicts. Those could involve various levels of precedence (where some files will be overwritten permanently) or simply renaming files with (1), (2), etc added to the tail. Pretty gross. I don’t recommend it and your users will not appreciate it. RelativeMove allows you to take from one known spot in the scanstate and move to another new known spot in the loadstate.
Question
I’m running into some weird issues with pre-seeding DFSR using robocopy with Win2008 and Win2008 R2, even when following your instructions from an old post. It looks like my hashes are not matching as I’m seeing a lot of conflicts. I also remember you saying that there will be a new article on pre-seeding coming?
Answer
1. Make sure you install these QFE version that fixes several problems with ACL’s and other elements not correctly copying in 2008/2008R2 – all file elements are used by DFSR to calculate the SHA-1 hash, so anything being different (including security) will conflict the file:
973776 The security configuration information, such as the ACL, is not copied if a backup operator uses the Robocopy.exe utility together with the /B option to copy a file on a computer that is running Windows Vista or Windows Server 2008 http://support.microsoft.com/default.aspx?scid=kb;EN-US;973776
2. Here’s my recommended robocopy syntax. You will want to ensure that the base folder (where copying from and to) have the same security and inheritance settings prior to copying, of course.
3. If you are using Windows Server 2008 R2 (or have a Win7 computer lying around), you can use the updated version of DFSRDIAG.EXE that supports the FILEHASH command. It will allow you to test and see if your pre-seeding was done correctly before continuing:
C:\>dfsrdiag.exe filehash Command "FileHash" or "Hash" Help: Displays a hash value identical to that computed by the DFS Replication service for the specified file or folder Usage: DFSRDIAG FileHash </FilePath:filepath>
</FilePath> or </Path> File full path name Example: /FilePath:d:\directory\filename.ext
It only works on a per-file basis, so it’s either for “spot checking” or you’d have to script it to crawl everything (probably overkill). So you could do your pre-seeding test, then use this to check how it went on some files:
Still working on the full blog post, sorry. It’s big and requires a lot of repro and validation, just needs more time – but it had that nice screenshot for you. :)
Question
Can a Windows NT 4.0 member join a Windows Server 2008 R2 domain?
Can Windows7/2008 R2 join an NT 4.0 domain?
Can I create a two-way or outbound trust between an NT 4.0 PDC and Windows Server 2008 R2 PDCE?
Short Snarky Answer
Yes, but good grief, really!?!
No.
Heck no.
Long Helpful Answer
If you enable the AllowNt4Crypto Netlogon setting and all the other ridiculously insecure settings required for NT 4.0 below you will be good to go. At least until you get hacked due to using a 15 year old OS that has not gotten a security hotfix in half a decade.
942564 The Net Logon service on Windows Server 2008 and on Windows Server 2008 R2 domain controllers does not allow the use of older cryptography algorithms that are compatible with Windows NT 4.0 by default http://support.microsoft.com/default.aspx?scid=kb;EN-US;942564
Windows 7 and 2008 R2 computers cannot join NT 4.0 domains due to fundamental security changes. No, this will not change. No, there is no workaround.
940268 Error message when you try to join a Windows Vista, Windows Server 2008, Windows 7, or Windows Server 2008 R2-based computer to a Windows NT 4.0 domain: "Logon failure: unknown user name or bad password" http://support.microsoft.com/default.aspx?scid=kb;EN-US;940268
Windows Server 2008 R2 PDCE’s cannot create an outbound or two-way trusts to NT 4.0 due to fundamental security changes . We have a specific article in mind for this right now, but the KB942564 was updated to reflect this also. No, this will not change. No, there is no workaround.
The real solution here is to stop expending all this energy to be insecure and keep ancient systems running. You obviously have newer model OS’s in the environment, just go whole hog. Upgrade, migrate or toss your NT 4.0 environments. Windows 2000 support just ended, for goodness sake, and it was 5 years younger than NT 4.0! For every one customer that tells me they need an NT 4.0 domain for some application to run (which no one ever actually checks to see if that’s true, because they secretly know it is not true), the other nineteen admit that they just haven’t bothered out of sheer inertia.
Let me try this another way – go here: http://www.microsoft.com/technet/security/bulletin/summary.mspx. This is the list of all Microsoft security bulletins in the past seven years. For five of those years, NT 4.0 has not gotten a single hotfix. Windows 2000 – remember, not supported now either– has gotten 174 security updates in the past four years alone. If you think your NT 4.0 environment is not totally compromised, it’s only because you keep it locked in an underwater vault with piranha fish and you keep the servers turned off. It’s an OS based on using NTLM’s challenge response security, which people are still gleefully attacking with new vectors.
You need Kerberos.
Question
We use a lot of firewalls between network segments inside our environment. We have deployed DFSR and it works like a champ, replicating without issues. But when I try to gather a health report for a computer that is behind a firewall, it fails with an RPC error. My event log shows:
Error Event Source: DCOM Event Category: None Event ID: 10006 Date: 7/15/2010 Time: 2:51:52 PM User: N/A Computer: SRVBEHINDFIREWALL Description: DCOM got error "The RPC server is unavailable."
Answer
If replication is working with the firewall but health reports are not, it sounds like DCOM/WMI traffic is being filtered out. Make sure the firewalls are not blocking or filtering the DCOM traffic specifically; a later model firewall that supports packet inspection may be deciding to block the DCOM types of traffic based on some rule. A double-sided network capture is how you will figure this out – the computer running MMC will connect remotely to DCOM over port 135, get back a response packet that (internally) states the remote port for subsequent connections, then the MMC will connect to that port for all subsequent conversations. If that port is blocked, no report.
For example here I connect to port 135 (DCOM/EPM), get a response packet that contains the new dynamic listening port to connect for DCOM – that port happens to be 55158 (but will differ every time). I then connect to that remote port in order to get a health diagnostic output using the IServerHealthReport call. If you create a double-sided network capture, you will likely see the first conversation fail, and if it succeeds, the subsequent conversation will be failing. Failing due the firewall dropping the packets and them never appearing on the remote host – that’s why you must use double-sided.
I know USMT cannot migrate local printers, but can it migrate TCP-port connected printers?
Answer
No, and for the same reason: those printers are not mapped to a print server that can send you a device driver and they are (technically) also a local printer. Dirty secret time: USMT doesn’t really migrate network printers, it just migrates these two registry keys:
HKCU\Printers\Connections HKCU\Printers\DevModes2
So if your printer is in those keys, USMT is win – and the only kind that live there are mapped network printers. When you first logon and access the printer on your newly restored computer, Windows will just download the driver for you and away you go. Considering that you are in the middle of this big migration, now would be a good time to get rid of these old (wrong?) ways of connecting printers. Windows 7 has plenty of options for printer deployment through group policy, group policy preferences, and you can even make the right printers appear based on the user’s location. For example, here’s what I see when I add a printer here at my desk – all I see are the printers in my little building on the nearest network. Not the ones across the street, not the ones I cannot use, not the ones I have no business seeing. Do this right and most users will only see printers within 50 feet of them. :)
To quote from the book of Bourdain: That does not suck.
Question
What are the best documents for planning, deploying, and completing a forest upgrade from Win2000/2003 to Win2008/2008R2? [Asked at least 10 times a week – Ned]
If you are planning a domain upgrade, this should be your new homepage until the operation is complete. It is fantastic documentation with checklists, guides, known issues, recommended hotfixes, and best practices. It’s the bee’s knees, the wasp’s elbows, and the caterpillar's feets.
These folks claim they have a workable attack on Kerberos smart card logons. Except that we’ve had a way to prevent the attack for three years, starting in Vista using Strict KDC Validation – so that kinda takes the wind out of their sails. You can read more about how to make sure you are protected here and here and soon here. Pretty amazing also that this is the first time – that I’ve heard of, at least – in 11 years of MS Kerberos smart cards that anyone was talking attacks past the theoretical stage.
Of 102 topics, 10 are directly around Microsoft and Windows attacks. 48 are around web, java, and browser attacks. How much attention are you giving your end-to-end web security?
10 topics were also around attacking iPhones and Google apps. How much attention are you giving those products in your environment? They are now as interesting to penetrate as all of Windows, according to Black Hat.
5 topics on cloud computing attacks. Look for that number to double next year, and then double again the year after. Bet on it, buddy.
Finally, remember my old boss Mike O’Reilly? Yes, that guy that made the Keebler tree and who was the manager in charge of this blog and whom I worked with for 6 years. Out of the blue he sends me this email today – using his caveman Newfie mental gymnastics:
Ned,
I never ever read the Askds blog when I worked there. I was reading it today and just realized that you are funny.
Hello world, Ned here again. I’m back to write this week’s mail sack – just in time to be gone for the next two weeks on vacation and work travel. In the meantime Jonathan and Scott will be running the show, so be sure to spam the heck out of them with whatever tickles you. This week we discuss DFSR, Certificates, PKI, PowerShell, Audit, Infrastructure, Kerberos, NTLM, Active Directory Migration Tool, Disaster Recovery, and not-art.
I need to understand what the difference between the various AD string type attribute syntaxes are. From http://technet.microsoft.com/en-us/library/cc961740.aspx : String(Octet), String(Unicode), Case-Sensitive String, String(Printable), String(IA5) et al. While I understand each type represents a different way to encode the data in the AD database, it isn't clear to me:
Why so many?
What differences are there in managing/using/querying them?
If an application uses LDAP to update/read an attribute of one string type, is it likely to encounter issues if the same routine is used to update/read a different string type?
Answer
Active Directory has to support data-storage needs for multiple computer systems that may use different standards for representing data. Strings are the most variable data to be encoded because one has to account for different languages, scripts, and characters. Some standards limit characters to the ANSI character set (8-bit) while others specify another encoding type altogether (IA5 or PrintableString for X.509, for example).
Since Active Directory needs to store data suitable for all of these various systems, it needs to support multiple encodings for string data.
Management/query/read/write differences will depend very much on how you access the directory. If you use PowerShell or ADSI to access the directory, some level of automation is involved to properly handle the syntax type. PowerShell leverages the System.String class of the .NET Framework which handles, pretty much invisibly, the various string types.
Also, when we are talking about the 255-character extended ANSI character set, which includes the Latin alphabet used in English and most European Languages, then the various encodings are pretty much identical. You really won't encounter much of a problem until you start working in 2-byte character sets like Kanji or other Eastern scripts.
Question
Is it possible / advisable to run the CA service under an account different from SYSTEM with EFS enabled for some files that should not be read by system or would another solution be more appropriate?
Answer
No, running the CA service under any account other than Network Service is not supported. Users who are not trusted for Administrator access to the server should not be granted those rights.
[PKI and string type answers courtesy of Jonathan Stephens, the “Blaster” in our symbiotic “Master Blaster” relationship – Ned]
It’s complicated and we’re getting this ironed out. Jonathan is going to create a whole blog post on how User Kerberos can function perfectly without a Kerberos Trust, or with an NTLM trust, or with no trust. It’s all smoke and mirrors basically – you don’t need a trust in all circumstances to use User Kerberos. Heck, don’t even have to use a domain-joined computer. For now, disregard that article please.
Question
I followed the steps outlined in this blog post: http://blogs.msdn.com/b/ericfitz/archive/2005/08/04/447951.aspx. Works like a champ and I see the data correctly in the Event Viewer. But when I try to use PowerShell 2.0 on one of those Win2003 DC’s with this syntax:
This appears to be an issue in PowerShell 2.0 Get-EventLog cmdlet on Win2003 where an incorrect value is being displayed. You can’t have the issue on Win2008/2008 R2, I verified. Hopefully one of our Premier contract customers will report this issue so we can investigate further and see what the long term fix options are.
In the meantime though, here’s some sample workaround code I banged up using an alternative legacy cmdlet Get-WmiObject to do the same thing (including returning the latest event only, which makes this pretty slow):
Get-WmiObject -query "SELECT * FROM Win32_NTLogEvent Where Logfile = 'Security' and EventCode=566" | sort timewritten –desc | select –first 1
A better long term solution (for both auditing and PowerShell) is get your DC’s running Win2008 R2.
Question
Do you have suggestions for pros/cons on breaking up a large DFSR replication group? One of our many replication groups has only one replicated folder. Over time that folder has gotten to be a bit large with various folders and shares (hosted as links) nested within. Occasionally there are large changes to the data and the replication backlog obviously impacts the ENTIRE folder. I have thought about breaking the group into several individual replication folders, but then I begin to shudder at the management overhead and monitoring all the various backlogs, etc.
Is there a smooth way to transition an existing replication group with one replicated folder into one with many replicated folders? By "smooth" I mean no disruption to current replication if at all possible, and without re-replicating the data.
What are the major pros/cons on how many replicated folders a given group has configured?
Answer
There’s no real easy answer – any change of membership or replicated folder within an RG means a re-synch of replication; the boundaries are discrete and there’s no migration tool. The fact that a backlog is growing won’t be helped by more or fewer RG/RF combos though, unless the RG/RF’s now involve totally different servers. Since the DFSR service’s inbound/outbound file transfer model is per server, moving things around locally doesn’t change backlogs significantly*.
So:
No way to do this without total replication disruption (as you must rebuild the RG’s/RF’s in DFSR from scratch; the only saving grace here is if you don’t have to move data, you would get some pre-seeding for free).
Since each RF would still have a staging/conflictanddeleted/installing/deleted folder each, there’s not much performance reasoning behind rolling a bunch of RF’s into a single RG. And no, you cannot use a shared structure. :) The main piece of an RG is administrative convenience: delegation is configured at an RG level for example, so if you had a file server admin that ran all the same servers that were replicating… stuff… it would be easier to organize those all as one RG.
* As a regular reader though, I imagine you’ve already seen this, which has some other ways to speed things up; that may help some of the choke ups:
It is – to be blunt – a kludge in our current implementation.
Question
I am working on an inter-forest migration that will involve a transitional forest hop. If I have to move the objects a second time to get them from a transition forest into our forest then will I lose the original SID History that is in the SID History attribute.?
Answer
You will end up with multiple SID history entries. It’s not an uncommon scenario to see customers would have been through multiple acquisitions and mergers end up with multiple SID histories. As far as authorization goes, it works fine and having more than one is fine:
Contains previous SIDs used for the object if the object was moved from another domain. Whenever an object is moved from one domain to another, a new SID is created and that new SID becomes the objectSID. The previous SID is added to the sIDHistory property.
The real issue is user profiles. You have to make sure that ADMT profile translation is performed so that after users and computers are migrated the ProfileList registry entries are updated to use the user’s real current SID info. If you do not do this, when you someday need to use USMT to migrate data it will fail as it does not know or care about old SID history, only the SID in the profile and the current user’s real SID.
Do you know if there is any problem with creating a DNS record with the name ldap.contoso.com name? Or maybe there will be some problems with other components of Active Directory if there is a record called “LDAP”?
Answer
Windows certainly will not care and we’ve had plenty of customers use that specific DNS name. We keep a document of reserved names as well, so if you don’t see something in this list, you are usually in good shape from a purely Microsoft perspective:
I'm currently working on a migration to Windows Server 2008 R2 AD forest – specifically the Disaster Recovery plan. Is it good idea to take one of the DCs offline, and after every successful "adprep operation" bring it back online? Or in case if something will go bad use this offline one to recreate domain?
That way no matter what happens under any circumstances (not just adprep), you have a way out. You can’t imagine how many customers we deal with every day that have absolutely no AD Disaster Recovery system in place at all.
2. Use the arrows to see more of the style options, and you’ll see the one called “Relaxed Perspective, White”. Select that and your picture will now look like a three dimensional piece of paper.
Hello folks, Ned here again. By now many businesses have begun deploying Windows Server 2008 R2 and Windows 7. Since Active Directory has become ubiquitous, Kerberos is now commonplace. What you may not know is that we made a significant change to default cryptographic support in Kerberos starting in Win7/R2 and if you are not careful, it may break some of your environment: by default, the DES encryption type is no longer enabled.
For those who have homogenous Windows networks with no third party operating systems or appliances, and who have not configured DES for any user accounts, you can stop reading.
Ok, one guy left. Everyone else pay attention.
Some Background on Kerberos Encryption Types
The phrase “encryption type” is simply another way of saying cryptography. Windows supports many cipher suites in order to protect Kerberos from being successfully attacked and decrypted. These suites use different key lengths and algorithms; naturally, the newer the cipher suite we support and use, the more secure the Kerberos.
Encryption
Key length
MS OS Supported
AES256-CTS-HMAC-SHA1-96
256-bit
Windows 7, Windows Server 2008 R2
AES128-CTS-HMAC-SHA1-96
128-bit
Windows Vista, Windows Server 2008 and later
RC4-HMAC
128-bit
Windows 2000 and later
DES-CBC-MD5
56-bit
Windows 2000 and later, off by default in Win7/R2
DES-CBC-CRC
56-bit
Windows 2000 and later, off by default in Win7/R2
In practical terms, a Windows computer starts a Kerberos conversation sending a list of supported encryption types (ETypes). The KDC responds to the list with the most secure encryption type they both support. For example, a Windows 7 computer sends an AS_REQ. The AS_REQ contains the supported encryption types of AES256, AES128, RC4, and DES (only because I enabled it through security policy) – we can see this in a network capture:
The client uses a password hash to encrypt a key. The client uses the encrypted key to protect the time stamp that it includes in the “real” AS_REQ. In this instance, the preferred encryption used is AES256, the highest level of encryption supported by Win7 and 2008 R2:
I use Netmon 3.4 for the above examples (I’ll explain its importance later). As you can see, it’s un-fun to parse Kerberos traffic with it. This is how it looks in Wireshark; sometimes it’s easier to read for learning purposes:
DES (Data Encryption Standard) came about in the late 1970’s as a standardized encryption suite. Since then it’s been adopted by a lot of software; it’s probably one of the most supported ciphers in the world. It’s also quite insecure and no version of Windows has ever used it by default when talking to Windows KDCs; the minimum there has always been 128-bit RC4-HMAC. Starting in Windows 7, we decided that Windows, out of the box, would no longer support DES… at all. You’re in good shape as long as you don’t have any operating systems other than Windows.
The problem is people use other operating systems (and may not even know it; your appliance web proxy is running a 3rd party operating system, bub). Those operating systems are not always configured to use Kerberos security at the highest cipher levels and often do not support negotiation or pre-authentication. Also, they may not support AES ciphers. And certain applications might require DES encryption due to short-sighted programming or default settings.
This leaves you in a pickle: do you roll the dice and deploy Windows Server 2008 R2 DC’s and Windows 7 clients, hoping that there are no issues? Or do you enable DES on all your new computers using group policy, knowing you are enabling a cipher that weakens Kerberos? I think there’s a third option… that’s better…
Finding Kerberos EType usage in your domain
We document some simple steps for finding DES usage in your domain in KB977321 using network captures in a test environment. But wouldn’t it be easier to determine Kerberos usage based on security auditing so that you could gather and analyze and query data ? You can. This only requires is that you have DC’s running at least Windows Server 2008.
1. If you already deployed Windows Server 2008 R2 and have enabled DES everywhere to error on the side of app compatibility, then configure security auditing against all DCs for success:
Kerberos Authentication Service
Kerberos Service Ticket Operations
These auditing actions are part of the Account Logon category. For more details on these review these two KBs:
And no, this doesn’t work with Windows Server 2003 DC’s. Who cares, DES can’t be touched there… :)
Depending on the size of your environment or quantity of auditing events, you may need to use some sort of security event log harvesting service like ACS. It will make querying your data easier. There are third parties that make these kinds of apps as well.
2. Drink mint juleps for a few days.
3. Examine your security audit event logs on your DC’s. Here is where it gets interesting. A few examples:
-------------------------
Log Name:Security
Source:Microsoft-Windows-Security-Auditing
Date:10/13/2010 5:06:47 PM
Event ID:4769
Task Category: Kerberos Service Ticket Operations
Level:Information
Keywords:Audit Success
User:N/A
Computer:2008r2-01-f.contoso.com
Description:
A Kerberos service ticket was requested.
Account Information:
Account Name:krbned@CONTOSO.COM
Account Domain:CONTOSO.COM
Logon GUID:{eed17165-1ca0-613b-51ae-17005546c7f0}
Service Information:
Service Name:2008R2-01-F$
Service ID:CONTOSO\2008R2-01-F$
Network Information:
Client Address:::ffff:10.70.0.221
Client Port:49203
Additional Information:
Ticket Options:0x40810000
Ticket Encryption Type:0x12
Failure Code:0x0
Transited Services:-
-------------------------
Log Name:Security
Source:Microsoft-Windows-Security-Auditing
Date:10/12/2010 10:32:29 AM
Event ID:4768
Task Category: Kerberos Authentication Service
Level:Information
Keywords:Audit Success
User:N/A
Computer:2008r2-01-f.contoso.com
Description:
A Kerberos authentication ticket (TGT) was requested.
Account Information:
Account Name:krbned
Supplied Realm Name:CONTOSO
User ID:CONTOSO\krbned
Service Information:
Service Name:krbtgt
Service ID:CONTOSO\krbtgt
Network Information:
Client Address:::ffff:10.70.0.115
Client Port:1088
Additional Information:
Ticket Options:0x40810010
Result Code:0x0
Ticket Encryption Type:0x17
Pre-Authentication Type:2
Certificate Information:
Certificate Issuer Name:
Certificate Serial Number:
Certificate Thumbprint:
------------------------
These “Ticket Encryption Type” values look mighty interesting. But what is a 0x17? Or a 0x12? Is there a complete list of what these all mean?
Use Netmon to decipher these values. First though, I’ll let you in on a little secret: Netmon 3 exists mainly as part of our efforts to document our protocols for the EU and the DOJ. That’s why when you look at the parsing in the frame details page it is designed more for completeness than readability. You get to reap the rewards of this, as it’s why the Netmon parsers are not monolithic – instead, they allow easy viewing and even live editing, all loaded from text.
1. Go back and look at that network capture screenshot I showed previously:
All of those Etypes have a number in parenthesis. But they aren’t hex numbers. And from looking at my event logs above for example, the etype 0x12 came from a Windows 7 computer; that has to be AES-256, which the above screenshot shows is a value of 18.
I just gave you a big hint. :)
2. Take a look at the parsers tab in Netmon, specifically for Protocols, then KerberosV5:
3. Take a look at the KrbETypeTable entry – look familiar? Here’s where those numbers are coming from that get displayed in the parser:
Table KrbETypeTable( eType )
{
Switch( eType )
{
Case 1: FormatString("des-cbc-crc (%d)", eType);
Case 2: FormatString("des-cbc-md4 (%d)", eType);
Case 3: FormatString("des-cbc-md5 (%d)", eType);
Case 4: FormatString("[reserved] (%d)", eType);
Case 5: FormatString("des3-cbc-md5 (%d)", eType);
Case 6: FormatString("[reserved] (%d)", eType);
Case 7: FormatString("des3-cbc-sha1 (%d)", eType);
//9 through f in both RFC 3961 and MCPP
Case 9: FormatString("dsaWithSHA1-CmsOID (%d)", eType);
Case 10: FormatString("md5WithRSAEncryption-CmsOID (%d)", eType);
Case 11: FormatString("sha1WithRSAEncryption-CmsOID (%d)", eType);
Case 12: FormatString("rc2CBC-EnvOID (%d)", eType);
Case 13: FormatString("rsaEncryption-EnvOID (%d)", eType);
Case 14: FormatString("rsaES-OAEP-ENV-OID (%d)", eType);
Case 15: FormatString("des-ede3-cbc-Env-OID (%d)", eType);
Case 16: FormatString("des3-cbc-sha1-kd (%d)", eType);
Case 17: FormatString("aes128-cts-hmac-sha1-96 (%d)", eType);
Case 18: FormatString("aes256-cts-hmac-sha1-96 (%d)", eType);
Case 0x17: FormatString("rc4-hmac (%d)", eType);
Case 0x18: FormatString("rc4-hmac-exp (%d)", eType);
Case 0x41: FormatString("subkey-keymaterial (%d)", eType);
And what do you think happens if I use calc to convert decimal 18 to hex? Indeed – you get 0x12. Which is aes256-cts-hmac-sha1-96, and that’s what your event log was trying to tell you. So all converted out, this means that the theoretical event log entries could be:
Hex
Etype
0x1
des-cbc-crc
0x2
des-cbc-md4
0x3
des-cbc-md5
0x4
[reserved]
0x5
des3-cbc-md5
0x6
[reserved]
0x7
des3-cbc-sha1
0x9
dsaWithSHA1-CmsOID
0xa
md5WithRSAEncryption-CmsOID
0xb
sha1WithRSAEncryption-CmsOID
0xc
rc2CBC-EnvOID
0xd
rsaEncryption-EnvOID
0xe
rsaES-OAEP-ENV-OID
0xf
des-ede3-cbc-Env-OID
0x10
des3-cbc-sha1-kd
0x11
aes128-cts-hmac-sha1-96
0x12
aes256-cts-hmac-sha1-96
0x17
rc4-hmac
0x18
rc4-hmac-exp
0x41
subkey-keymaterial
And if you want to catch DES usage, you should watch for events that included 0x1 and 0x3, as those are the versions of DES that Windows implements. Tada…
Regardless of whether or not you care about Kerberos DES parsing, you can use these techniques to reverse engineer protocols based on the Netmon parser code or even fix parser errors. It’s a slick technique to keep in your back pocket. If you just wanted to cheat you could have looked these up in RFC 3961. This is “teaching to fish” time :).
Ok, now what?
It’s all well and good to know that you have software using DES in your environment. The next step is to change that behavior. Here are your tasks:
Make sure you have no DES-enabled user accounts in your domain.
Use your event log audit trail to create an inventory of computers sending DES etypes. Examine those computers and devices (they are probably not running Windows).
If the computers are running Windows, examine them for non-Microsoft software. One of those is the culprit. Netmon, Process Monitor, Process auditing, etc. can all be used to track down which process is requiring the insecure protocol. Contact the vendor about your options to alter the behavior.
If the computers are not running Windows or they are appliances, examine their local Kerberos client configurations or contact their vendors. You will also need to look at the installed apps as the OS might not be to blame (but it usually is).
If you get stuck with a vendor that refuses to stop using DES, contact their sales department and make a stink. Sales will usually be your advocate, as they want your money so they can buy more BMW M3’s. Using DES at this point is terminal laziness or the sign of a vendor that absolutely does not care at all about security – probably not someone with which you want to do business.
Final thoughts
This post wasn’t a treatise on Kerberos or even encryption types, naturally. If you want a lot more interesting reading an insomnia cure, I recommend:
Hi folks, Ned here again with your questions and our answers. This is a pretty long one; looks like everyone is back from vacation, winter storms, and hiding from the boss. Today we talk Kerberos, KCC, SPNs, PKI, USN journaling, DFSR, auditing, NDES, PowerShell, SIDs, RIDs, DFSN, and other random goo.
It’s a sticky question – MS does not make a NIC teaming solution, so you are at the mercy of 3rd party vendor software and if there are any issues, we cannot help other than to break the team. So the question you need to answer is “do you trust your NIC vendor support?”
Generally speaking, we are not huge fans of NIC teaming, as we see customers having frequent driver issues and because a DC probably doesn’t need it. If clients are completely consuming 1Gbit or 10Gbit network interfaces, the DC is probably being overloaded with requests. Doubling that network would make things worse; it’s better to add more DCs. And if the DC is also running Exchange, file server, SQL, etc. you are probably talking about an environment without many users or clients.
A failover NIC solution is probably a better option if your vendor supports it. Meaning that the second NIC is only used if the first one burns out and dies, all on the same network.
Question
We used to manually create SPNs with IP addresses to allow Kerberos without network name resolution. This worked in Windows XP and 2003 but stopped working in later operating systems. Is this expected?
Answer
Yes it is. Starting in Windows Vista and forever more, the OS examines the format of the SPN being requested and if it is only an IP address, Kerberos is not even attempted. There’s no way to override this behavior. If I look at it in practical terms, having manually set an IP Address for SPN:
This is why in this previous post– see the “I want to create a startup script via GPO” and “NTLM is not allowed for computer-to-computer communication” sections – I highly discouraged customers from this sort of hacking. What I didn’t realize when I wrote the old post was that I now have the power to control the future with my mind.
I see that the DFSR staging folder can be moved, but can the Conflict and Deleted (\dfsrprivate\conflictanddeleted) folder be relocated? If so, how?
Answer
It cannot be moved or renamed – this was once planned (and there is even an AD attribute that makes one think the location could be specified) but it never happened in the service code. Regardless of what you put in that attribute, DFSR ignores it and creates a C&D folder at the default location.
For example, here I specified a completely different C&D path using ADSIEDIT.MSC before DFSR even created the folder. Once I started the DFSR service, it ignored my setting and created the conflict folder with defaults:
We are trying to find the best way to issue Active Directory "User" certificates to iPhones and iPads, so these users can authenticate to our third party VPN appliance using their "user" certificate. We were thinking that MS NDES could help up with this. Everything I have read says that NDES is used for non domain "computer or device" enrollment.
Just because the certificate template that is used by NDES must be of type computer does not mean you cannot build a SCEP protocol message to the NDES Server for use by a user account on the iPhone in question.
Keep in mind that the SCEP protocol was designed by Cisco for their network appliances to be able to enroll for certificates online. Also understand what NDES means - Network Device Enrollment Service.
Realistically there is no reason why you cannot enroll for a certificate via SCEP interface with NDES and have a user account using the issued certificate. However, NDES is code to specifically only allow for enrollment of computer based certificate templates. If you put a user based template name in the registry for it to issue, it will fail with a not –so-easily deciphered message.
That said, keep in mind that the subject or Subject Alternative Name field identifies the user of the certificate not the template.
So what you could do is:
Duplicate the computer certificate template.
Then change the subject to “Supply in the Request”
Then give the template a unique name.
Make sure that the NDES account and Administrator have security access to the template for Enroll.
Assign the Template to be issued.
Then you need to assign the template to one of the purposes in the NDES registry (You might want to use the one for both signing and encrypting). See the blog.
Now you have a certificate with the EKU of Client Authentication and a subject / SAN of the user account, I don’t see why you could not use that for what you need. Not that I have tested this or can test this, mind you…
Question
Is there a “proper” USN Journal setting versus replicated data sizes, etc. on the respective volumes housing DFSR data? I've come across USN journal wrap issues (that properly self heal ... and then occur again a month or so later). I’m hoping to know a happy medium on USN journal sizing versus size of volume or data that resides on that volume.
Answer
I did a quick bit of research - in the history of all MS DFSR support cases, it was necessary to increase the USN journal size for five customers – not exactly a constant need. Our recommendation is not to alter it unless you get multiple 2202 events that can’t be fixed any other way:
Event ID=2202 Severity=Warning The DFS Replication service has detected an NTFS change journal wrap on volume %2. A journal wrap can occur for the following reasons: 1.The USN journal on the volume has been truncated. Chkdsk can truncate the journal if it finds corrupt entries at the end of the journal. 2.The DFS Replication service was not running on this computer for an extended period of time. 3.The DFS Replication service could not keep up with the rate of file changes on the volume. The service has automatically initiated the journal wrap recovery process.
Additional Information: Volume: %1
Since you are getting multiple 2202 occurrences, I would recommend first figuring out why you are getting the journal wraps. The three reasons listed in the event need to be considered – the first two are avoidable (fix your disk or controller and stop turning the service off) and should be handled without a need to alter the USN journal.
The third one may mean you are not using DFSR as recommended, but that may be unavoidable. In that case, set the USN size value to 1GB and validate that the issue stops occurring. We have no real formula here (remember, only five customers ever), but if you cannot spare another 512MB on the drive you have much more important problems to consider around disk capacity. If still not enough, revisit if DFSR is the right solution for you – the amount of changes occurring would have to be so incredibly rapid that I doubt DFSR could ever realistically keep up and converge. And make sure that nothing else is updating all the files outside of the journal on that drive – there is only one journal and it contains entries for all files, even the ones not being replicated!
Just to answer the inevitable question: you use WMI to increase the USN journal size.
On Win2003 R2 only:
1. Determine the volume in question (USN journals are volume specific) and the GUID for that volume by running the following:
WMIC.EXE /namespace:\\root\Microsoftdfs path DfsrVolumeInfo get VolumePath WMIC.EXE /namespace:\\root\Microsoftdfs path DfsrVolumeInfo get VolumeGUID
2a. Raise the USN Journal Size (for one particular volume):
WMIC /namespace:\\root\microsoftdfs path dfsrvolumeconfig.VolumeGuid="%GUID%" set minntfsjournalsizeinmb=%MB SIZE%
where you replace '%GUID%' with the volume GUID and '%MB SIZE%' with a larger USN size in MB. For example:
WMIC /namespace:\\root\microsoftdfs path dfsrvolumeconfig.VolumeGuid="D1EB0B66-9403-11DA-B12E-0003FFD1390B" set minntfsjournalsizeinmb=1024
This will return 'Property Update Successful' for that GUID.
2B. Raise the USN Journal Size (for all volumes)
WMIC /namespace:\\root\microsoftdfs path dfsrvolumeconfig set minntfsjournalsizeinmb=%MB SIZE%
This will return 'Property Update Successful' for ALL the volumes.
3. Restart server for new journal size to take effect in NTFS.
Update 4/15/2011 - On Win2008 or later:
1. Open Windows Explorer. 2. In Tools | Folder Options | View - uncheck 'Hide protected operating system files'. 3. Navigate to each drive's 'system volume information\dfsr\config' folder (you will need to add 'Administrators, Full Control' to prevent access denied error). 4. In Notepad, open the 'Volume_%GUID%.xml' file for each volume you want to increase. 5. There will be a set of tags that look like this:
6. Stop the DFSR service. 6. Change '512' to the new increased value. 7. Close and save that file, and repeat for any other volumes you want to up the journal size on. 8. Start the DFSR service back up.
Question
There is a list of DFS Namespace events for Server 2000 at http://support.microsoft.com/kb/315919. I was wondering if there is a similar list of Windows 2008 DFS Event Log Messages?
Answer
That event logging system in KB315919 exists only in Win2000 – Win2003 and later OSs don’t have it anymore. That KB is a bit misleading also: these events will never write unless you enable them through registry settings.
Registry Key: HKEY_LOCAL_MACHINE\SOFTWARE\MicroSoft\Windows NT\CurrentVersion\Diagnostics Value name: RunDiagnosticLoggingDfs Value type: REG_DWORD Value data: 0 (default: no logging), 2 (verbose logging)
Registry Key: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Dfs Value name: DfsSvcVerbose Value type: REG_DWORD Value data: Any one of the below three values: 0 (no debug output) 1 standard debug output 0x80000000 (standard debug output plus additional Dfs volume call info)
Value name: IDfsVolInfoLevel Value type: REG_DWORD Value data: Any combination of the following 3 flags: 0x00000001 Error 0x00000002 Warning 0x00000004 Trace
Dave and I scratched our heads and in our personal history of supporting DFSN, neither of us recalled ever turning this on or using those events for anything useful. Not that it matters now, Windows 2000 is as dead as fried chicken.
Question
We currently have inherited auditing settings on a lot of files and folders that live on our two main DFSR servers. The short story is that before the migration to DFSR, the audit settings were apparently added by someone to the majority of the files/folders. This was replicated by DFSR and now is set on both servers. Thankfully we do not have any audit policies turned on for those servers currently.
That is where the question comes in: there may be a time in the relatively near future that we will want to enable some auditing for a subset of files/folders. Any suggestions on how we could remove a lot of the audit entries on these servers, without forcing nearly every file to get processed by DFSR?
Answer
Nope, it’s going to cause an unavoidable backlog as DFSR reconciles all the security changes you just made – the audit security is part of the file just like the discretionary security. Don’t do that until you have a nice big change control window open. Maybe just do some folders at a time.
In the future, using Global Object Access Auditing would be an option (if you have Win2008 R2 on all DFSR servers). Since it is all derived by LSA and not directly stamped, DFSR won’t replicated the file – the files are never actually changed. It’s slick:
In theory, you could get rid of the auditing in place currently currently and just use GOAA someday when you need it. It’s the future of file auditing, in my opinion; using direct SACLs on files should be discouraged forever more.
Question
Does the SID for an object have to be unique across the entire forest? It is pretty clear from existing documentation that the SID does have to be unique within a domain because of the way the RID Master distributes RID pools to the DCs. Does the RID Master in the Forest Root domain actually keep track of all the unique base SIDs of all domains to ensure that there is no accidental duplication of the unique base domain SIDs?
Answer
A SID will be unique within a forest, as each domain has a unique base SID that combines with a RID. That’s why there’s a RID master per domain. There is no reasonable way for the domain SIDs to ever be duplicated by Windows, although I have seen some third party products that made it happen. All hell broke loose, we don’t plan for the impossible. :) Even if you use ADMT to migrate users with SID History within a forest, it will not be duplicated as the migration will always destroy the old user when it is “moved”.
The RID Masters don’t talk to each other within the forest (any more than they would between different forests, where a duplicate SID would cause just as many problems when you tried to create a trust). The base SID is a random 48 bit number, so there is no reasonable way it could be duplicated by accident in the same environment. It comes down to us relying on the odds of two domains that know of each other ending up with the same SID through pure random chance – highly unlikely math.
You’ll also find no mention of inter-RID master needs or error messages communication here:
“A USN journal loss occurred 2 times in the past 7 days on E:. DFS Replication monitors the USN journal to detect changes made to the replicated folder. Although DFS Replication automatically recovers from this problem, replication stops temporarily for replicated folders stored on this volume. Repeated journal loss usually indicates disk issues. Event ID: 2204”
Is this how the health report indicates a journal wrap or can I take “loss” literally ?
Answer
Ouch. That’s not a wrap, the journal was deleted or irrevocably damaged. I have never actually seen that event in the field, only in a test lab where I deleted my journal intentionally (using the nasty command: FSUTIL.EXE USN DELETEJOURNAL). I would suspect either a failing disk or 3rd party disk management software. It’s CHKDSK and disk diagnostic time for you.
The net recovery process is similar to a wrap for event 2204 ; the journal gets recreated, then repopulated like a wrap recovery (it uses the same code). You get event 2206 to know that it’s fixed.
Question
How come there is no “Set-SPN” cmdlet in AD PowerShell?
-ServicePrincipalNames <hashtable> Specifies the service principal names for the account. This parameter sets the ServicePrincipalNames property of the account. The LDAP display name (ldapDisplayName) for this property is servicePrincipalName. This parameter uses the following syntax to add remove, replace or clear service principal name values. Syntax: To add values: -ServicePrincipalNames @{Add=value1,value2,...} To remove values: -ServicePrincipalNames @{Remove=value3,value4,...} To replace values: -ServicePrincipalNames @{Replace=value1,value2,...} To clear all values: -ServicePrincipalNames $null
You can specify more than one change by using a list separated by semicolons. For example, use the following syntax to add and remove service principal names. @{Add=value1,value2,...};@{Remove=value3,value4,...}
The operators will be applied in the following sequence: ..Remove ..Add ..Replace
The following example shows how to add and remove service principal names. -ServicePrincipalNames-@{Add="SQLservice\accounting.corp.contoso.com:1456"};{Remove="SQLservice\finance.corp. contoso.com:1456"}
We do not have any special handling to retrieve SPNs using Get-AdComputer or Get-Aduser (nor any other attributes – they treat all as generic properties). For example:
get-adcomputer name –properties serviceprincipalnames | select-object –expand serviceprincipalnames
I used select-object –expand because when you get a really long returned list, PowerShell likes to start truncating the readable output. Also, when I don’t know which cmdlets support which things, I sometimes cheat use educated guesses:
I have posted a TechNet forum question around the frequency of KCC nomination and rebuilding and I was hoping you could reply to it.
“…He had made an update to the Active Directory Schema and as a safety-net had switched off one of our domain controllers whilst he did it. The DC (2008 R2) that was switched off was at the time acting as the automatically determined bridgehead server for the site.
Obviously the next thing that has to happen is for the KCC to run, discover the bridgehead server is still offline and re-nominate. My colleague thinks that this re-nomination should take upto 2 hours to happen. However all the documentation I can find suggests that this should be every 15 minutes. His argument is that it is a process of sampling, that it realises the problem every 15 minutes but can take upto 2 hours to actually action the change of bridgehead.
Can anyone tell me which of us is right please and if we could have a problem?”
Answer
We are running an exchange program between MS Support and MS Premier Field Engineering and our current guest is AD topology guru Keith Brewer. He replied in exhaustive detail here:
Attaboy Keith, now you’re doing it our way – when in doubt, use overwhelming force.
Other random goo
If you’re going to jailbreak phones, do it with Microsoft – you get a free handset and t-shirt instead of a subpoena.
The 2011 CES Innovation Honoree awards are out. Holy crap, the Digital Storm Online gaming rig is nom nom nom. I also want the Recon goggles for no legitimate reason.
It’s utterly impossible, but Duke Nukem Forever comes out May 3rd. Trailer is not SFW, as you would expect.
Unless it doesn’t.
Star Wars on Blu-ray coming in September, now up for pre-order. Damn, I guess I have to get Blu-ray. Hopefully Lucas uses the opportunity to remove all midichlorian references.
Hello folks, Ned here again. Today we talk PDCs, DFSN, DFSR, AGPM, authentication, PowerShell, Kerberos, event logs, and other random goo. Let’s get to it.
Is the PDC Emulator required for user authentication? How long can a domain operate without a server that is running the PDC Emulator role?
Answer
It’s not required for direct user authentication unless you are using (unsupported) NT and older operating systems or some Samba flavors. I’ve had customers who didn’t notice their PDCE was offline for weeks or months. Plenty of non-fully routed networks exist where many users have no direct access to that server at all.
However!
It is used for a great many other things:
With the PDCE offline, users who have recently changed their passwords are more likely to get logon or access errors. They will also be more likely to stay locked out if using Account Lockout policies.
Time can more easily get out of sync, leading to Kerberos authentication errors down the road.
The PDCE being offline will also prevent the creation of certain well-known security groups and users when you are upgrading forests and domains.
The AdminSDHolder process will not occur when the PDCE is offline.
You will not be able to administer DFS Namespaces.
It is where group policies are edited (by default).
Finally - and not documented by us - I have seen various non-MS applications over the years that were written for NT and which would stop working if there is no PDCE. There’s no way to know which they might be – a great many were home-made application written by the customer themselves – so you will have to determine this through testing.
But don’t just trust me; I am a major plagiarizer!
The DFSR help file recommends a full mesh topology only when there are 10 or fewer members. Could you kindly let me know reasons why? We feel that a full mesh will mean more redundancy.
Answer
It’s just trying to prevent a file server administrator from creating an unnecessarily complex or redundant topology, especially since the vast majority of file server deployments do not follow this physical network topology. The help file also makes certain presumptions about the experience level of the reader.
It’s perfectly ok – from a technical perspective - to make as many connections as you like if using Windows Server 2008 or later. This is not the case with Win2003 R2 (see this old post that applies only to that OS). The main downsides to a lot of connections are:
It may lead to replication along slower, non-optimal networks that are already served by other DFSR connections; DFSR does not sense bandwidth or use any site/connection costing. This may itself lead to the networks becoming somewhat slower overall.
It will generate slightly more memory and CPU usage on each individual member server (keeping track of all this extra topology is not free).
It’s more work to administer. And it’s more complex. And more work + more complex usually = less fun.
Question
I'm trying setup delegation for Kerberos but I can't configure it for user or computer accounts using AD Users and Computers (DSA.MSC). I’m logged as a domain administrator. Every time when I'm trying activate delegation I get error:
The following Active Directory error occurred: Access is denied.
Answer
It’s possible that someone has removed the user right for your account to delegate. Check your applied domain security policy (using RSOP or GPRESULT or whatever) to see if this has been monkeyed up:
Computer Configuration\Windows Settings\Security Settings\Local Policies\User Rights Assignment "Enable computer and user accounts to be trusted for delegation"
The Default Domain Controllers policy will have the built-in Administrators group set for that user right assignment once you create a domain. The privilege serves no purpose being set on servers other than DCs, they don’t care. Changing the defaults for this assignment isn’t necessary or recommended, for reasons that should now be self-evident.
Question
I want to clear all of my event logs at once on Windows Vista/2008 or later computers. Back in XP/2003 this was pretty easy as there were only 6 logs, but now there are a zillion.
Answer
Your auditors must love you :). Paste this into a batch file and run in an elevated CMD prompt as an administrator:
Wevtutil el > %temp%\eventlistmsft.txt For /f "delims=;" %%i in (%temp%\eventlistmsft.txt) do wevtutil cl "%%i"
If you run these two commands manually, remember to remove the double percent signs and make them singles; those are being escaped for running in a batch file. I hope you have a systemstate backup, this is forever!
Question
Can AGPM be installed on any DC? Should it be on all DCs? The PDCE?
Answer
[Answer from AGPM guru Sean Wright]
You can install it on any server as long as it’s part of the domain - so a DC, PDCE, or a regular member server. Just needs to be on one computer.
Question
Is it possible to use Authentication Mechanism Assurance that is available in Windows Server 2008 R2 with a non-Microsoft PKI implementation? Is it possible to use Authentication Mechanism Assurance with any of Service Administration groups Domain Admins or Enterprise Admins? If that is possible what would be the consequences for built-in administrator account, would this account be exempt from Authentication Mechanism Assurance? So that administrators would have a route to fix issues that occurred in the environment, i.e. a get out of jail.
Answer
[Answer from security guru Rob Greene]
First, some background:
This only works with Smart Card logon.
This works because the Issuance Policy OID is “added to” msDS-OIDToGroupLink on the OID object in the configuration partition. There is a msDS-OIDToGroupLinkBl (back link) attribute on the group and on the OID object.
The attribute msDS-OIDToGroupLink attribute on the OID object (in the configuration partition)stores the DN of the group that is going to use it.
Not sure why, but the script expects the groups that are used in this configuration to be Universal groups. So the question about Administrative groups, none of these are Universal groups except for “Enterprise Admins”.
So here are the answers:
Is it possible to use Authentication Mechanism Assurance that is available in Windows Server 2008 R2 with a non-Microsoft PKI implementation?
Yes, however, you will need to create the Issuance Policies that you plan to use by adding them through the Certificate Template properties as described in the TechNet article.
Is it possible to use Authentication Mechanism Assurance with any of Service Administration groups Domain Admins or Enterprise Admins?
This implementation requires that the group be a universal group in order for it to be used. So the only group of those listed above that is universal is “Enterprise Admins”. In theory this would work, however in practice it might not be such a great idea.
If that is possible what would be the consequences for built-in administrator account, would this account be exempt from Authentication Mechanism Assurance?
In most cases the built-in Administrator account is special cased to allow access to certain things even if their access has somehow been limited. However, this isn’t the best way to design your security of administrative accounts if you are concerned about not being able to get back into the domain. You would have similar issues if you made these administrative accounts require Smart Cards for logon, then for some reason the CA hierarchy did not publish a new CRL and the CA required a domain based admin to be able to logon interactively then you would be effectively locked out of your domain also.
Question
I find references on TechNet to a “rename-computer” PowerShell cmdlet added in Windows 7. But it doesn’t seem to exist.
Answer
Oops. Yeah, it was cut very late but still lives on in some documentation. If you need to rename a computer using PowerShell, the approach I use is:
That keeps it all on one line without need to specify an instance first or mess around with variables. You need to be in an elevated CMD prompt logged in as an administrator, naturally.
Then you can run restart-computer and you are good to go.
There are a zillion other ways to rename on the PowerShell command-line, shelling netdom.exe, wmic.exe, using various WMI syntax, new functions, etc.
Question
Does disabling a DFS Namespace link target still give the referral back to clients, maybe in with an “off” flag or something? We’re concerned that you might still accidentally access a disabled link target somehow.
Answer
[Oddly, this was asked by multiple people this week.]
Disable actually removes the target from referral responses and nothing but an administrator’s decision can enable it. To confirm this, connect through that DFS namespace and then run this DFSUTIL command-line (you may have to install the Win2003 Support Tools or RSAT or whatever, depending on where you run this):
DFSUTIL /PKTINFO
It will not list out your disabled link targets at all. For example, here I have two link targets – one enabled, one disabled. As far as DFS responds to referral requests, the other link target does not exist at all when disabled.
When DFSR staging fills to the high watermark, what happens to inbound and outbound replication threads? Do we stop replicating until staging is cleared?
Answer
Excellent question, Oz dweller.
When you hit the staging quota 90% high watermark, further staging will stop.
DFSR will try to delete the oldest files to get down to 60% under the quota.
Any files that are on the wire right now being transferred will continue to replicate. Could be one file, could be more.
If those files on the wire are ones that the staging cleanup is trying to delete, staging cleanup will not complete (and you get warning 4206).
No other files will replicate (even if they were not going to be cleaned out due to “newness”).
Once those outstanding active file transfers on the fire complete, staging will be cleaned out successfully.
Files will begin staging and replicating again (at least until the next time this happens).
So the importance of staging space for very large files remains to ensure that quota is at least as large as the N largest files that could be simultaneously replicated inbound/outbound, or you will choke yourself out. From the DFSR performance tuning post:
Windows Server 2003 R2: 9 largest files
Windows Server 2008: 32 largest files (default registry)
Windows Server 2008 R2: 32 largest files (default registry)
Windows Server 2008 R2 Read-Only: 16 largest files
If you want to find the 32 largest files in a replicated folder, here’s a sample PowerShell command:
If I create a domain-based namespace (\\contoso.com\root) and only have member servers for namespace servers, the share can’t browsed to in Windows Explorer. It is there, I just can’t browse it.
But if I add a DC as a namespace server it immediately appears. If I remove the DC from namespace it disappears from view again, but it is still there. Would this be expected behavior? Is this a “supported” way create a hidden namespace?
Answer
You are seeing some coincidental behavior based on the dual meaning of contoso.com in this scenario:
Contoso.com will resolve to a domain controller when using DNS
When a DC hosts a namespace share and you are browsing that DC, you are simply seeing all of its shares. One of those shares happens to be a DFS root namespace.
When you are browsing a domain-based namespace not hosted on a DC, you are not going to see that share as it doesn’t exist on the DCs.
You can see what’s happening here under the covers with a network capture.
Users can still access the root and link shares if they type them in, had them set via logon script, mapped drive, GP Preference Item, etc. This is only a browsing issues.
It’s not an “unsupported” way to hide shares, but it’s not necessarily effective in the long-term. The way to hide and prevent access to the links and files/folders is through permissions and ABE. This solution is like a share with $ being considered hidden: only as long as people don’t talk about it. :) Not to mention this method is easy for other admins to accidentally “break” it through ignorance or reading blog posts that tell them all the advantages of DFS running on a DC.
PS: Using a $ does work – at least on a Win2008 R2 DFS root server in a 2008 domain namespace:
IO9.com posted their spring sci-fi book wish list. Which means that I now have eight new books in my Amazon wish list. >_<
As a side note, does anyone like the new format of the Gawker Media blogs? I cannot get used to them and had to switch back to the classic view. The intarwebs seem to be on my side in this. I find myself visiting less often too, which is a real shame – hopefully for them this isn’t another scenario like Digg.com, redesigning itself into oblivion.
Netflix finally gets some serious competition – Amazon Prime now includes free TV and Movie streaming. Free as in $79 a year. Still, very competitive pricing and you know they will rock the selection.
I get really mad watching the news as it seems to be staffed primarily by plastic heads reading copy written by people that should be arrested for inciting to riot. So this Cracked article on 5 BS modern myths is helpful to reduce your blood pressure. As always, it is not safe for work and very sweary.
But while you’re there anyway (come on, I know you), check out the kick buttitude of Abraham Lincoln.
Finally: why are the Finnish so awesomely insane at everything?
Hi folks, Ned here again. I recently wrote a KB article about some expected DCDIAG.EXE behaviors. This required reviewing DCDIAG.EXE as I wasn’t finding anything deep in TechNet about the “Services” test that had my interest. By the time I was done, I had found a dozen other test behaviors I had never known existed. While we have documented the version of DCDIAG that shipped with Windows Server 2008 – sometimes with excellent specificity, like Justin Hall’s article about the DNS tests– mostly it’s a black box and you only find out what it tests when the test fails. Oh, we have help of course: just run DCDIAG /? to see it. But it’s help written by developers. Meaning you get wording like this:
Advertising Checks whether each DSA is advertising itself, and whether it is advertising itself as having the capabilities of a DSA.
So, it checks each DSA (whatever that is) to see if it’s advertising (whatever that means). The use of an undefined acronym is an especially nice touch, as even within Microsoft, DSA could mean:
Naturally, this brings out my particular brand of OCD. What follows is the result of my compulsion to understand. I’m not documenting every last switch in DCDIAG, just the tests. I am only documenting Windows Server 2008 R2 SP1 behavior – I have no idea where the source code is for the ancient Support Tools version of DCDIAG and you aren’t paying me enough here to find it :-). The Windows Server 2008 RTM through Windows Server 2008 R2 SP1 versions are nearly identical except for bug fixes:
Everything I describe below you can discover and confirm yourself with careful examination of network captures and logging, to include the public functions being used– but why walk when you can ride? Using /v can also provide considerable details on some tests. No internal source code is described nor do I show any special hidden functionality.
For info on all the network protocols I list out – or if you run into network errors when using DCDIAG – see Service overview and network port requirements for the Windows Server system. I went pretty link-happy in general in this post to help people using it as a reference; that way if you just look at your one little test it has all the info you need. I don’t always call out name resolution being tested because it is implicit; it’s also testing TCP, UDP, and IP.
Finally: this post is more of a reference than my usual lighthearted fare. Do not operate heavy machinery while reading.
Initial Required Tests
This tests general connectivity and responsiveness of a DC, to include:
The DNS test can be satisfied out of the client cache so restarting the DNS client service locally is advisable when running DCDIAG to guarantee a full test of name resolution. For example:
Net stop "dns client" & net start "dns client" & dcdiag /test:verifyreplicas /s:DC-01
The initial tests cannot be skipped.
The initial tests use ICMP, LDAP, DNS, and RPC on the network.
Editorial note: Blocking ICMP will prevent DCDIAG from working. While blocking ICMP is highly recommended at the Internet-edge of your network, internally blocking ICMP traffic mainly just leads to administrative headaches like breaking legacy group policy, breaking black hole router detection (or leading to highly inefficient MTU sizes due to lack of a discovery option), and breaking troubleshooting tools like ping.exe or tracert.exe. It creates an illusion of security; there are a great many other easy ways for a malicious internal user to locate computers.
Advertising
This test validates that the public DsGetDcName function used by computers to locate domain controllers will correctly locate any DCs specified with in the command line with the /s, /a, or /e parameter. It checks that the server successfully reports itself with DS_Flags for:
GC or not (and if claiming to be a GC, if the is GC ready to respond to requests )
Note that “advertising” is not the same as “working”. For instance, if the KDC service is stopped the Advertising test will fail since the flag returned from DsGetDcName will not include KDC. But if port 88 over TCP and UDP are blocked on a firewall, the Advertising test will pass – even though the KDC is not going to be able to answer requests for Kerberos tickets.
This test is done using RPC over SMB (using a Netlogon named pipe) to the DC plus LDAP to locate the DCs site information.
CheckSDRefDom
This test validates that your application partition cross reference objects (located in “cn=partitions,cn=configuration,dc=<forest root domain>”) contain the correct domain names in their msDS-SDReferenceDomain attributes. The test uses LDAP.
I find no history of anyone ever seeing the error message that can be displayed here.
This test does a variety of checks around the security components of a DC like Kerberos. For it to be more specifically useful you should provide /replsource:<some partner DC> as the default checks are not as comprehensive.
This test:
Validates that at least one KDC is online for each domain and they are reachable (first in the same site, then anywhere in the domain if that fails)
Checks if packet fragmentation of Kerberos over UDP might be an issue based on current MTU size by sending non-fragmenting ICMP packets
Checks if the DC’s computer account exists in AD, if it’s within the default “Domain Controllers” OU, if it has the correct UserAccountControl flags for DCs, that the correct ServerReference attributes are set, and if the minimum Service Principal Names are set
Validates that the DCs computer object has replicated to other DCs
Validates that there are no replication or KCC connection issues for connected partners by querying the function DsReplicaGetInfo to get any security-related errors
When the /replsource is added, a few more tests happen. The partner is checked for all of the above also, then:
Time skew is calculated between the servers to verify it is less than 300 seconds for Kerberos. It does not check the Kerberos policy to see if allowed skew has been modified
Permissions are checked on all the naming contexts (such as Schema, Configuration, etc.) on the source DC to validate that replication and connectivity will work between DCs
Connectivity is checked to validate that the user running DCDIAG (and therefore in theory, all other users) can connect to and read the SYSVOL and NETLOGON shares without any security errors. It also checks IPC$, but inability to connect there would have broken many earlier tests
The "Access this computer from the network" privilege on the DC is checked to verify it is held by Administrators, Authenticated Users, and Everyone groups
The DC's computer object is checked to ensure it is the latest version on the DCs. This is done to prove replication convergence since a very stale DC might lead to security issues for users, problems with the DCs own computer account password, or secure channels to other servers. It checks versions, USNs, originating servers, and timestamps
These tests are performed using LDAP, RPC, RPC over SMB, and ICMP.
Connectivity
No matter what you specify for tests, this always runs as part of Initial Required Tests.
CrossRefValidation
This test retrieves a list of naming contexts (located in “cn=partitions,cn=configuration,dc=<forest root domain>”) with their cross references and then validates them, similar to the CheckSDRefDom test above. It is looking at the nCName , dnsRoot, nETBIOSName, and systemFlags attributes to:
Make sure the names or DNs are not invalid or null
Confirm DNs are not otherwise mangled with CNF or 0ADEL (which happens during Conflict or Deletion operations)
Ensure the systemFlags are correct for that object
Tests the AD replication topology to ensure there are no DCs without working connection objects between partners. Any servers that cannot replicate inbound or outbound from any DCs are considered “cut off”. It uses the function DsReplicaSyncAll to do this which means this “test” actually triggers replication on the DCs so use with caution if you are the owner of crud WAN links that you keep clean with schedules, and certainly consider this before using /e.
This test is rather misleading in its help description; if it cannot contact a server that is actually unavailable to LDAP on the network then it gives no error or test results, even if the /v parameter is specified. You have to notice that there is no series of “analyzing the alive system replication topology” or “performing upstream (of target) analysis” messages being printed for a cutoff server. However, the Connectivity test will fail if the server is unreachable so it’s a wash.
The DCpromo test is one of the two oddballs in DCDIAG (the other is ‘DNS’). It is designed to test how well a DCPROMO would proceed if you were to run it on the server where DCDIAG is launched. It also has a number of required switches for each kind of promotion operation. All of the tests are against the server specified first in the client DNS settings. It tests:
If at least one network adapter has a primary DNS server set
That the proposed authoritative DNS zone can be contacted
If dynamic DNS updates are possible for the server’s A record. It checks both the setting on the authoritative DNS zone as well as the client registry configuration of DnsUpdateOnAllAdapters and DisableDynamicUpdate
If an LDAP DClocator record (i.e. “_ldap ._tcp.dc._msdcs.<domain>”) is returned when querying for existing forests
This test validates the File Replication Service’s health by reading (and printing, if using /v) FRS event log warning and error entries from the past 24 hours. It’s possible this service won’t be running or installed on Windows Server 2008 or later if SYSVOL has been migrated to DFSR. On Windows Server 2008, some events may be misleading as they may refer to custom replica sets and not necessarily SYSVOL; on Windows Server 2008 R2, however, FRS can be used for SYSVOL only.
By default, remote connections to the event log are disabled by the Windows Server 2008/R2 firewall rules so this test will fail. KB2512643 covers enabling those rules to allow the test to succeed.
This test validates the Distributed File System Replication service’s health by reading (and printing, if using /v) DFSR event log warning and error entries from the past 24 hours. It’s possible this service won’t be running or installed on Windows Server 2008 if SYSVOL is still using FRS; on Windows Server 2008 R2 the service is always present on DCs. While this ostensibly tests DFSR-enabled SYSVOL, any errors within custom DFSR replication groups would also appear here, naturally.
By default, remote connections to the event log are disabled by the Windows Server 2008/R2 firewall rules so this test will fail. KB2512643 covers enabling those rules to allow the test to succeed.
The value name has to exist with a value of 1 to pass the test. This test will work with either FRS or DFSR-replicated SYSVOLs. It doesn’t check if the SYSVOL and NELOGON shares are actually accessible, though (CheckSecurityError does that).
The test uses RPC over SMB (through a named pipe to WinReg).
LocatorCheck
This test validates that DCLocator queries return the five “capabilities” that any DC must know of to operate correctly.
If not hosting one, the DC will refer to another DC that can satisfy the request; this means that you must carefully examine this under /v to make sure a server you thought was supposed to be holding a capability actually is correctly returned. If no DC answers or if the queries return errors then the test will fail.
This test uses Directory Replication Service (DRS) functions to check for conditions that would prevent inter-site AD replication within a specific site or all sites:
Locates and connect to the Intersite Topology Generators (ISTG)
You must be careful with this test’s command-line arguments and always provide /a or /e. Not providing a site means that the test runs but skips actually testing anything (you can see this under /v).
All tests use RPC over the network to test the replication aspects and will make registry connections (RPC over SMB to WinReg) to check for those NTDS settings override entries. LDAP is also used to locate connection info.
KccEvent
This test queries the Knowledge Consistency Checker on a DC for KCC errors and warnings generated in the Directory Services event log during the last 15 minutes. This 15 minute threshold is irrespective of the Repl topology update period (secs) registry value on the DC.
By default, remote connections to the event log are disabled by the Windows Server 2008/R2 firewall rules so this test will fail. KB2512643 covers enabling those rules to allow the test to succeed.
This test returns the DC's knowledge of the five Flexible Single Master Operation (FSMO) roles. The test does not inherently check all DCs knowledge for consistency, but using the /e parameter would provide data sufficient to allow comparison.
The minimum Service Principal Names are set. For those paying close attention, this is identical to one test aspect of CheckSecurityError; this is because they use the same internal test
This test also mentions two repair options:
/RecreateMachineAccount will recreate a missing DC computer object. This is not a recommended fix as it does not recreate any child objects of a DC, such as FRS and DFSR subscriptions. The best practice is to use a valid SystemState backup to authoritatively restore the DC's deleted object and child objects. If you do use this /RecreateMachineAccount option then the DC should then be gracefully demoted and promoted to repair all the missing relationships
/FixMachineAccount will add the UserAccountControl flags to a DCs computer object for “TRUSTED_FOR_DELEGATION” and “SERVER_TRUST_ACCOUNT”. It’s safe to use as a DC missing those bit flags will not function and it does not remove other bit flags present. Using this repair option is preferred over trying to set these flags yourself through ADSIEDIT or other LDAP editors
This test checks permissions on all the naming contexts (such as Schema, Configuration, etc.) on the source DC to validate that replication and connectivity will work between DCs. It makes sure that “Enterprise Domain Controllers” and “Administrators” groups have the correct minimum permissions. This is the same performed test within CheckSecurityError.
Validate that the user running DCDIAG (and therefore in theory, all other users) can connect to and read the SYSVOL and NETLOGON shares without any security errors. It also checks IPC$, but inability to connect there would have broken many earlier tests
Verify that the Administrators, Authenticated Users, and Everyone group have the “access this computer from the network” privilege on the DC. If not, you’d see a ton of other errors here though, naturally
Both of these tests are also performed by CheckSecurityError.
The tests use SMB and RPC over SMB (through named pipes).
ObjectsReplicated
This test verifies that replication of a few key objects and attributes has occurred and displays up-to-dateness info if replication is stale. By default the two objects validated are:
The ”CN=NTDS Settings” object of each DC exists up to date on all other DCs.
The “CN=<DC name>” object of each DC exists up to date on all other DCs.
This test is not valuable unless run with /e or /a as it just asks the DC about itself when those are not specified. Using /v will give more details on objects thought to be stale based on version.
You can also specify arbitrary objects to test with /objectdn /n, which can be useful after creating a “canary” object to validate replication.
This test is designed to check external trusts. It will not run by default and will fail even when provided correct /testdomain parameters, validating the secure channel with NLTEST.EXE, and using a working external trust. It does state that the secure channel is valid but then mistakenly reports that there are no working trust objects. I’ll update this post when I find out more. This test should not be used.
RegisterLocatorDnsCheck
Validates many of the same aspects as the Dcpromo test. It requires the /dnsdomain switch to specify a domain that would be the target of registration; this can be a different domain than the current primary one. It specifically verifies:
If at least one network adapter has a primary DNS server set.
That the proposed authoritative DNS zone can be contacted
If dynamic DNS updates are possible for the server’s A record. It checks both the setting on the authoritative DNS zone as well as the client registry configuration of DnsUpdateOnAllAdapters and DisableDynamicUpdate
If an LDAP DClocator record (i.e. “_ldap ._tcp.dc._msdcs.<domain>”) is returned when querying for existing forests
This role must be online and accessible for DCs to be able to create security principals (users, computers, and groups) as well as for further DCs to be promoted within a domain.
This test validates that various AD-dependent services are running, accessible, and set to specific start types:
RPCSS - Start Automatically – Runs in Shared Process
EVENTSYSTEM - Start Automatically - Runs in Shared Process
DNSCACHE - Start Automatically - Runs in Shared Process
NTFRS - Start Automatically - Runs in Own Process (if domain functional level is less than Windows Server 2008. Does not trigger on SYSVOL being replicated by FRS)
ISMSERV - Start Automatically - Runs in Shared Process
KDC - Start Automatically - Runs in Shared Process
SAMSS - Start Automatically - Runs in Shared Process
SERVER - Start Automatically - Runs in Shared Process
WORKSTATION - Start Automatically - Runs in Shared Process
W32TIME - Start Manually or Automatically - Runs in Shared Process
NETLOGON - Start Automatically - Runs in Shared Process
(If target is Windows Server 2008 or later)
NTDS - Start Automatically - Runs in Shared Process
DFSR - Start Automatically - Runs in Own Process (if domain functional level is Windows Server 2008 or greater. Does not trigger on SYSVOL being replicated by DFSR)
(If using SMTP-based AD replication)
IISADMIN - Start Automatically - Runs in Shared Process
SMTPSVC - Start Automatically - Runs in Shared Process
These are the “real” service names listed in HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services. If this test is specified when targeting Windows Server 2003 DCs it is expected to fail on RpcSs. See KB2512643.
This test validates the System Event Log’s health by reading and printing entries from the past 60 minutes (stopping at computer startup timestamp if less than 60 minutes). Errors and warnings will be printed, with no evaluation done of them being expected or not – this is left to the DCDIAG user.
By default, remote connections to the event log are disabled by the Windows Server 2008/R2 firewall rules so this test will fail. KB2512643 covers enabling those rules to allow the test to succeed.
For disconnected topologies (i.e. missing connection objects), both upstream and downstream from each reference DC
The test uses DsReplicaSyncAll with the flag of DS_REPSYNCALL_DO_NOT_SYNC. Meaning that the test analyzes and validates replication topology without actually replicating changes. The test does not validate the availability of replication partners – having a partner offline will not cause failures in this test. This does not test if the schedule is completely closed, preventing replication; to see those active replication results, use tests Replications or CutoffServers.
This test verifies computer reference attributes for all DCs, including:
ServerReference attribute correct for a DC on cn=<DC name>,cn=<site>,cn=sites,cn=configuration,dc=<domain>
ServerReferenceBL attribute correct for a DC site object on a DC on cn=<DC Name>,ou=domain controllers,dc=<domain>
frsComputerReference attribute correct for a DC site object on cn=domain system volume (sysvol share),cn=ntfrs subscriptions,cn=<DC Name>,ou=domain controllers,DC=<domain>
frsComputerReferenceBL attribute correct for a DC object on cn=<DC Name>,cn=domain system volume (sysvol share),cn=file replication service,cn=system,dc=<domain>
hasMasterNCs attribute correct for a DC on cn=ntds settings,cn=<DC Name>,cn=<site>,cn=sites,cn=configuration,dc=<domain>
nCName attribute correct for a partition at cn=<partition name>,cn=partitions,cn=configuration,dc=<domain>
msDFSR-ComputerReference attribute correct for a DC DFSR replication object on cn=<DC Name>,cn=topology,cn=domain system volume,cn=dfsr-blobalsettings,cn=system,dc=<domain>
msDFSR-ComputerReferenceBL attribute correct for a DC site object on a DC on cn=<DC Name>,ou=domain controllers,dc=<domain>
Note that the two DFSR tests are only performed if domain functional level is Windows Server 2008 or higher. This means there will be an expected failure if DFSR has not been migrated to SYSVOL as the test does not actually care if FRS is still in use.
The test uses LDAP. The DCS are not all individually contacted, only the specified DCs are contacted.
VerifyReferences
This test verifies computer reference attributes for a single DC, including:
ServerReference attribute correct for a DC on cn=<DC name>,cn=<site>,cn=sites,cn=configuration,dc=<domain>
ServerReferenceBL attribute correct for a DC site object on a DC on cn=<DC Name>,ou=domain controllers,dc=<domain>
frsComputerReference attribute correct for a DC site object on cn=domain system volume (sysvol share),cn=ntfrs subscriptions,cn=<DC Name>,ou=domain controllers,DC=<domain>
frsComputerReferenceBL attribute correct for a DC object on cn=<DC Name>,cn=domain system volume (sysvol share),cn=file replication service,cn=system,dc=<domain>
msDFSR-ComputerReference attribute correct for a DC DFSR replication object on cn=<DC Name>,cn=topology,cn=domain system volume,cn=dfsr-blobalsettings,cn=system,dc=<domain>
msDFSR-ComputerReferenceBL attribute correct for a DC site object on a DC on cn=<DC Name>,ou=domain controllers,dc=<domain>
This is similar to the VerifyEnterpriseRefrences test except that it does not check partition cross references or all other DC objects.
This test verifies that the specified server does indeed host the application partitions specified by its crossref attributes in the partitions container. It operates exactly like CheckSDRefDom except that it does not show output data and validates hosting.
Hi folks, Ned here again. It’s been nearly a month since the last Mail Sack post so I’ve built up a good head of steam. Today we discuss FRS, FSMO, Authentication, Authorization, USMT, DFSR, VPN, Interactive Logon, LDAP, DFSN, MS Certified Masters, Kerberos, and other stuff. Plus a small contest for geek bragging rights.
I’ve read TechNet articles stating that the PDC Emulator is contacted when authentication fails - in case a newer password is available - and the PDCE would know this. What isn't stated explicitly is whether the client contacts or the current DC contacts the PDCE on behalf of the client. This is important to us as our clients won’t always have a routable connection to the PDCE but our DCs will; a DMZ/Perimeter network scenario basically.
The other DC is named 2008r2-srv-02 (10.70.0.102).
The client is named 7-x86-sp1-01 (10.70.0.111).
I configured the PDCE firewall to block ALL traffic from the client IP address. The PDCE can only hear from the other DC, like in your proposed DMZ. The non-PDCE and client can talk without restriction.
1. I use some bad credentials on my Windows 7 client (using RunAs to start notepad.exe as my Tony Wang account)
Can you help me understand cached domain logons in more detail? At the moment I have many Windows XP laptops for mobile users. These users logon to the laptops using cached domain logins. Afterwards they establish a VPN connection to the company network. We have some third party software that and group policies that don’t work in this scenario, but work perfectly if the user logs on to our corporate network instead of the VPN, using the exact same laptop.
Answer
We don’t do a great job in documenting how the cached interactive logon credentials work. There is some info here that might be helpful, but it’s fairly limited:
But from hearing this scenario many times, I can tell you that you are seeing expected behavior. Since a user is logging on interactively with cached creds (stored here in an encrypted form: HKEY_LOCAL_MACHINE\Security\Cache) while offline to a DC in your scenario, then they get a network created and access resources, anything that only happens at the interactive logon phase is not going to work. For example, logon scripts delivered by AD or group policy. Or security policies that apply when the computer is started back up (and won’t apply for another 90-120 minutes while VPN connected – which may not actually happen if the user only starts VPN for short periods).
I made a hideous flowchart to explain this better. It works – very oversimplified – like this:
As you can see, with a VPN not yet running, it is impossible to access a number of resources at interactive logon. So if your application’s “resource authentication” only works at interactive logon, there is nothing you can do unless the app changes.
This is why we created VPN at Logon and DirectAccess – there would be no reason to make use of those technologies otherwise.
If you have a VPN solution that doesn’t allow XP to create the “dial-up network” at interactive logon, that’s something your remote-access vendor has to fix. Nothing we can do for you I’m afraid.
Question
Can DFSR use security protocols other than Kerberos? I see that it has an SPN registered but I never see that SPN used in my network captures or ticket cache.
DFSR uses Kerberos auth exclusively. The DFSR client’s TGS request does not contain the DFSR SPN, only the HOST computer name. So the special looking DFSR SPN is - pointless. It’s one of those “almost implemented” features you occasionally see. :)
Let’s look at this in action.
Two DFSR (06 and 07) servers doing initial sync, talking to their DC (01). TGS requests/responses, using only the computer HOST name SPNs:
Then DFSR service opens RPC connections between each server and uses Kerberos to encrypt the RPC traffic with RPC_C_AUTHN_LEVEL_PKT_PRIVACY, using RPC_C_AUTHN_GSS_NEGOTIATE and requiring RPC_C_QOS_CAPABILITIES_MUTUAL_AUTH. Since NTLM doesn’t support mutual authentication, DFSR can only use Kerberos:
USMT is telling you the size estimate based on your possible NTFS cluster sizes. So 4096 means a 4096-byte cluster sizes will take 434405376 bytes (or 414MB) in an uncompressed store. Starting in USMT 4.0 though the /P option was extended and now allows you to specify an XML output file. It’s a little more readable and includes temporary space needs:
Good eye. DFSR uses LDAP to poll Active Directory in two ways in order to detect changes to the topology:
1. Every five minutes (hard-coded wait time) light polling checks to see if subscriber objects have changed under the computer’s Dfsr-LocalSettings container. If not, it waits another five minutes and tries again. If there is something new, it does a full LDAP lookup of all the settings in the Dfsr-GlobalSettings and its Dfsr-LocalSettings container, slurps down everything, and acts upon it.
2. Every sixty minutes (configurable wait time) it slurps down everything just like a light poll that detected changes, no matter if a change was detected or not. Just to be sure.
Want to skip these timers and go for an update right now? DFSRDIAG.EXE POLLAD.
"The current VV join is inherently inefficient. During normal replication, upstream partners build a single staging file, which can source all downstream partners. In a VV join, all computers that have outbound connections to a new or reinitialized downstream partner build staging files designated solely for that partner. If 10 computers do an initial join from \\Server1, the join builds 10 files in stage for each file being replicated."
Is this true – even if the file is identical FRS makes that many copies? What about DFSR?
Answer
It is true. On the FRS hub server you need staging as large as the largest file x15 (if you have 15 or more spokes) or you end up becoming rather ‘single threaded’; a big file goes in, gets replicated to one server, then tossed. Then the same file goes in, gets replicated to one server, gets tossed, etc.
Here I create this 1Gb file with my staging folder set to 1.5 GB (hub and 2 spokes):
Note how filename and modified are changing here in staging as it goes through one a time, as that’s all that can fit. If I made the staging 3GB, I’d be able to get both downstream servers replicating at once, but there would definitely be twoidentical copies of the same file:
Luckily, you are not using FRS to replicate large files anymore, right? Just SYSVOL, and you’re planning to get rid of that also, right? Riiiiiiiiggghhhht?
DFSR doesn’t do this – one file gets used for all the connections in order to save IO and staging disk space. As long as you don’t hit quota cleanup, a staged file will stay there until doomsday and be used infinitely. So when it works on say, 32 files at once, they are all different files.
Question
Are there any DFSR registry tuning options in Windows Server 2003 R2? This article only mentions Win2008 R2.
Answer
No, there are none. All of the OS non-specific ones listed are still valuable though:
Consider multiple hubs
Increase staging quota
Latest QFE and SP
Turn off RDC on fast connections with mostly smaller files
Consider and test anti-virus exclusions
Pre-seed the data when setting up a new replicated folder
Use 64-bit OS with as much RAM as possible on hubs
Use the fastest disk subsystem you can afford on hubs
Use reliable networks <-- this one is especially important on 2003 R2 as it does not support asynchronous RPC
Question
Is there a scriptable way to change do what DFSUTIL.EXE CLIENT PROPERTY STATE ACTIVE or Windows Explorer’s DFS’ Set Active tabs do? Perhaps with PowerShell?
Not a cmdlet (not even .NET), but could eventually be exposed by .NET’s DLLImport and thusly, PowerShell. Which sounds really, really gross to me.
Or just drive DFSUTIL.EXE in your code. I hesitate to ask why you’d want to script this. In fact, I don’t want to know. :)
Question
Are there problems with a user logging on to their new destination computer before USMT loadstate is run to migrate their profile?
Answer
Yes, if they then start Office 2007/2010 apps like Word, Outlook, Excel, etc. portions of their Office migration will not work. Office relies heavily on reusing its own built-in ‘upgrade’ code:
Note To migrate application settings, you must install applications on the destination computer before you run the loadstate command. For Office installations, you must run the LoadState tool to apply settings before you start Office on the destination computer for the first time by using a migrated user. If you start Office for a user before you run the LoadState tool, many settings of Office will not migrate correctly.
Other applications may be similarly affected, Office is just the one we know about and harp on.
Question
I am seeing very often that a process named DFSFRSHOST.EXE is taking 10-15% CPU resources and at the same time the LAN is pretty busy. Some servers have it and some don’t. When the server is rebooted it doesn’t appear for several days.
Answer
Someone is running DFSR health reports on some servers and not others – that process is what gathers DFSR health data on a server. It could be that someone has configured scheduled reports to run with DFSRADMIN HEALTH, or is just running it using DFSMGMT.MSC and isn’t telling you. If you have an enormous number of files being replicated the report can definitely run for a long time and consume some resources; best to schedule it off hours if you’re in “millions of files” territory, especially on older hardware and slower disks.
Question
FRS replication is not working for SYSVOL in my domain after we started adding our new Win2008 R2 DCs. I see this endlessly in my NTFRS debug logs:
Is FRS compatible between Win2003 and Win2008 R2 DCs?
Answer
That type of error makes me think you have some intrusion protection software installed (perhaps on the new servers, in a different version than on the other servers) or something is otherwise altering data on the network (such as when going through a packet-inspecting firewall).
We only ever see that issues when caused by a third party. There are no problems with FRS talking to each other on 2003, 2008, or 2008 R2. The FRS RPC code has not changed in many years.
You should get double-sided network captures and see if something is altering the traffic between the two servers. Everything RPC should look identical in both captures, down to a payload level. You should also try *removing* any security software from the 2 DCs and retesting (not disabling; that does nothing for most security products – their drivers are still loaded when their services are stopped).
Question
When I run USMT 4.0 scanstate using /nocompress I see a catalog.mig created. It seems to vary in size a lot between various computers. What is that?
Answer
It contains all the non-file goo collected during the gather; mainly the migrated registry data.
Other Stuff
James P Carrion has been posting a very real look into the MS Certified Masters program as seen through the eyes of a student working towards his Directory Services cert. If you’ve thought about this certification I recommend you read on, it’s fascinating stuff. Start at the oldest post and work forward; you can actually see his descent into madness…
----------
Microsoft uses a web-based system for facilities requests. The folks that run that department are excellent and the web system usually works great. Every so often though, you get something interesting like this…
Name both movies in which this picture appears. The first correct reply in the Comments gets the title of “Silverback Alpha Geek”. And nothing else… it’s a cruel world.
Hi folks, Ned here again. Today we discuss trusts rules around domain names, attribute uniqueness, the fattest domains we’ve ever seen, USMT data-only migrations, kicking FRS while it’s down, and a few amusing side topics.
I have two forests with different DNS names, but with duplicate NetBIOS names on the root domain. Can I create a forest (Kerberos) trust between them? What about NTLM trusts between their child domains?
Answer
You cannot create external trusts between domains with the same name or SID, nor can you create Kerberos trusts between two forests with the same name or SID. This includes both the NetBIOS and FQDN version of the name – even if using a forest trust where you might think that the NB name wouldn’t matter – it does. Here I am trying to create a trust between fabrikam.com and fabrikam.net forests – I get the super useful error:
“This operation cannot be performed on the current domain”
But if you are creating external (NTLM, legacy) trusts between two non-root domains in two forests, as long as the FQDN and NB name of those two non-root domains are unique, it will work fine. They have no transitive relationship.
So in this example:
You cannot create a domain trust nor a forest trust between fabrikam.com and fabrikam.net
You can create a domain (only) trust between left.fabrikam.com and right.fabrikam.net
You cannot create a domain trust between fabrikam.com and right.fabrikam.net
You cannot create a domain trust between fabrikam.net and left.fabrikam.com
Why don’t the last two work? Because the trust process thinks that the trust already exists due to the NetBIOS name match with the child’s parent. Arrrgh!
You could still have serious networking problems in this scenario regardless of the trust. If there are two same-named domains physically accessible through the network from the same computer, there may be a lot of misrouted communication when people just use NetBIOS domain names. They need to make sure that no one ever has to broadcast NetBIOS to find anything – their WINS environments must be perfect in both forests and they should convert all their DFS to using DfsDnsConfig. Alternatively they could block all communication between the two root domains’ DCs, perhaps at a firewall level.
Note: I am presuming your NetBIOS domain name matches the left-most part of the FQDN name. Usually it does, but that’s not a requirement (and not possible if you are using more than 15 characters in that name).
Question
Is possible to enforce uniqueness for the sAMAccountName attribute within the forest?
The Active Directory schema does not have a mechanism for enforcing uniqueness of an attribute. Those cases where Active Directory does require an attribute to be unique in either the domain (sAMAccountName) or forest (objectGUID) are enforced by other code – for example, AD Users and Computers won’t let you do it:
The only way you could actually achieve this is to have a custom user provisioning application that would perform a GC lookup for an account with a particular sAMAccountName, and would only permit creation of the new object should no existing object be found.
[Editor’s note: If you want to see what happens when duplicate user samaccountname entries are created, try this on for size in your test lab:
1. Enable AD Recycle Bin. 2. Create an OU called Sales. 3. Create a user called 'Sara Davis' with a logon name and pre-windows 2000 logon name of 'saradavis'. 4. Delete the user. 5. In the Users container, create a user called 'Sara Davis' with a logon name and pre-windows 2000 logon name of 'saradavis' (simulating someone trying to get that user back up and running by creating it new, like a help desk would do for a VIP in a hurry). 6. Restore the deleted 'Sara Davis' user back to her previous OU (this will work because the DN's do not match and the recreated user is not really the restored one), using:
7. Despite the above error, the user account will in fact be restored successfully and will now exist in both the Sales OU and the Users container, with the same sAMAccountName and userPrincipalName. 8. Logon as SaraDavis using the NetBIOS-style name. 9. Logoff. 10. Note in DSA.MSC how 'Sara Davis' in the Sales OU now has a 'pre-windows 2000' logon name of $DUPLICATE-<something>. 11. Note how both copies of the user have the same UPN. 12. Logon with the UPN name of saradavis@consolidatedmessenger.com and note that this attribute does not get mangled.
Which customer in the world has the most number of objects in a production AD domain?
Answer
Without naming specific companies - I have to protect their privacy - the single largest “real” domain I have ever heard of had ~8 million user objects and nearly nothing else. It was used as auth for a web system. That was back in Windows 2000 so I imagine it’s gotten much bigger since then.
I have seen two other customers (inappropriately) use AD as a quasi-SQL database, storing several hundred million objects in it as ‘transactions’ or ‘records’ of non-identity data, while using a custom schema. This scaled fine for size but not for performance, as they were constantly writing to the database (sometimes at a rate of hundreds of thousands of new objects a day) and the NTDS.DIT is - naturally - optimized for reading, not writing. The performance overall was generally terrible as you might expect. You can also imagine that promoting a new DC took some time (one of them called about how initial replication of a GC had been running for 3 weeks; we recommended IFM, a better WAN link, and to stop doing that $%^%^&@).
For details on both recommended and finite limits, see:
The real limit on objects created per DC is 2,147,483,393 (or 231 minus 255). The real limit on users/groups/computers (security principals) in a domain is 1,073,741,823 (or 230). If you find yourself getting close on the latter you need to open a support case immediately!
Question
Is a “Data Only” migration possible with USMT? I.e. no application settings or configuration is migrated, only files and folders.
Answer
Sure thing.
1. Generate a config file with:
scanstate.exe /genconfig:config.xml
2. Open that config.xml in Notepad, then search and replace “yes” with “no” (including the quotation marks) for all entries. Save that file. Do not delete the lines, or think that not including the config.xml has the same effect – that will lead to those rules processing normally.
3. Run your scanstate, including config.xml and NOT including migapp.xml. For example:
If you look through your log after using the steps above, none of those will appear.
You might also think that you could just rename the DLManifests and ReplacementManifests folders to get the same effect and you’d almost be right. The problem is that Vista or Windows 7also use the built in %systemroot%\winsxs\manifests folders, and you certainly cannot remove those. Just go with the config.xml technique.
Question
After we migrate SYSVOL from FRS to DFSR on Windows Server 2008 R2, we still see that the FRS service is set to automatic. Is it ok to disable?
Answer
Absolutely. Once an R2 server stops replicating SYSVOL with FRS, it cannot use that service for any other data. If you try to start the FRS service or replicate with it will log events like these:
Log Name: File Replication Service Source: NtFrs Date: 1/6/2009 11:12:45 AM Event ID: 13574 Task Category: None Level: Error Keywords: Classic User: N/A Computer: 7015-SRV-03.treyresearch.net Description: The File Replication Service has detected that this server is not a domain controller. Use of the File Replication Service for replication of non-SYSVOL content sets has been deprecated and therefore, the service has been stopped. The DFS Replication service is recommended for replication of folders, the SYSVOL share on domain controllers and DFS link targets.
Log Name: File Replication Service Source: NtFrs Date: 1/6/2009 2:16:14 PM Event ID: 13576 Task Category: None Level: Error Keywords: Classic User: N/A Computer: 7015-srv-01.treyresearch.net Description: Replication of the content set "PUBLIC|FRS-REPLICATED-1" has been blocked because use of the File Replication Service for replication of non-SYSVOL content sets has been deprecated. The DFS Replication service is recommended for replication of folders, the SYSVOL share on domain controllers and DFS link targets.
We document this in the SYSVOL Replication Migration Guide but it’s easy to miss and a little confusing – this article applies to both R2 and Win2008, and Win2008 can still use FRS:
7. Stop and disable the FRS service on each domain controller in the domain unless you were using FRS for purposes other than SYSVOL replication. To do so, open a command prompt window and type the following commands, where <servername> is the Universal Naming Convention (UNC) path to the remote server:
Sc <servername>stop ntfrs
Sc <servername>config ntfrs start=disabled
Other stuff
Another week, another barrage of cloud.
Image may be NSFW. Clik here to view. (courtesy of the http://xkcd.com/ blog)
Finally… friggin’ Word. Play this at 720p, full screen.
----
Have a great weekend folks.
- Ned “waiting on yet more email from humor-impaired MS colleagues about his lack of professionalism” Pyle
Hi folks, Ned here again. This week we talk about 10 reasons not to use list object access dsheuristics, USMT trivia nuggets, poor man’s DFSDIAG, how to get network captures without installing a network capture tool, and some other random goo. Oh yeah, and friggin’ Smurfs.
We’re thinking about using List Object Access dsheuristics mode to control people seeing data in Active Directory. Are there any downsides to this?
Answer
There are a few – here are at least ten in no particular order (thanks to PFE Matt Reynolds for some of these, although he may never realize it):
This can greatly increase the number of access check calls that are made, and can have a significant negative effect on performance.
This will require a huge amount of work and ongoing maintenance. You will need to create and look after – forever - selective “views” for admins, help desks, service accounts, etc.
Microsoft applications are not generally tested with this setting.
If you can find a third party vendor that tests this, I will have a heart attack and die from shock. If you can then find a vendor that is willing to change their code if you run into problems, I will then rise from the grave and eat my own pants.
It’s very difficult to test how well apps are handling this, as it’s designed to “omit data”. That could have all sorts of weird effects on apps expecting to see certain built-in or “always available” objects.
Active Directory is a… directory. It’s designed to share info. Specific sensitive attribute data can always be marked confidential and that’s probably really what you want here.
Doing this is one of the least useful security measures in a whole litany of things that you probably haven’t implemented – encrypting your LDAP traffic, using IPSEC everywhere, using two-factor smart cards for all user access, encrypting all drives, preventing physical removal of computers. Or making sure your web servers don’t allow ancient SQL injection attacks. Focus!
Just because you can do something does not mean you should do something. We provide an option to format your hard drive as well.
Strangely, two people asked about this in the past few weeks.
Question
Can USMT perform “incremental” or “differential” scans into a store? We have a lot of data to capture and it may take awhile, especially when going to a remote store. We’d like to do it in phases if possible.
Answer
Sorry, no. USMT completely deletes the destination store contents when you start a scanstate (this is why you have to specify /o if the store already exists). If you perform a hardlink migration though, you are not copying data and it will scan much faster than a classic store.
If you have to use a remote compressed classic store and you’re worried about reliability, run your scanstate to a local store location on the disk, then copy that store folder to a network location afterwards. Make sure you calculate space estimations to ensure you are not going to run out of disk, naturally.
Question
I don’t have any Win2008 servers – so I cannot use DFSDIAG.EXE– but I’d like to report on their DFS Namespace health. Are there other tools?
That will monitor health of Win2003 DFSN very well indeed. You can also use DFSDIAG via RSAT on Vista and Win7 clients; why do I suspect that you’re looking for a more… frugal… option, though? ;-P
The old DFSUTIL.EXE tool will stand in for DFSDIAG in a pinch, but it requires you to both run more commands and interpret the results carefully. It’s not going to spend much time explaining what’s wrong, so much as show you what it thinks is configured and let you decide if that’s wrong or not. Some of the more useful commands:
dfsutil.exe /root:<dfs name> /view /verbose
dfsutil.exe /server:<root server> /view
dfsutil.exe /domain:<domain> /view
dfsutil /sitename:<root server or dc or target or client>
USMT 4.0 only cares that you run it against a client OS SKU, and that it be XP or later. The download is a CAB file and doesn’t have any OS checking for installation, only scanstate and loadstate enforce the OS. If you dig into the nugget of that main KB at the bottom you will see only:
The reason it lists the OS on the download page is it has to say something, and USMT is built from the Windows 7/R2 source tree. So there you go.
Awesome Technique for Win7/2008 R2 Network Captures
Not a question, but a cool method that is too small to rate a full blog post: if you need to get a network capture on a Windows 7 or Windows Server 2008 R2 computer and you do not have or want Netmon installed, you can use NETSH.EXE. From an elevated CMD prompt run:
Open that file in Netmon 3.4 and you get all the usual capture info, plus other conversation and process info. AND other cool stuff – open the CAB file it created and you find a bunch of useful files with IP info, firewall event logs, applied group policies, driver versions, and more. All the goo I gather manually when I am getting a capture. Sweet!
Thanks to Tim “Mighty” Quinn for demoing this here.
Other Stuff
A few years ago TechNet Magazine stopped printing paper copy and switched to a web-only format. I lost track of them after that, but this weekend, I started going through their online versions from 2010 and 2011. It turns out there’s good stuff I’d been missing. Here are a few cherry picked articles; feel free to point out some other favorites in the Comments:
An interesting explanation of what Beta used to mean, and what it means now, from a Principal SDE who has been developing Windows since the Tithonian age. Heck, his blog is ready to collect Social Security.
How to be an effective troubleshooter. Don’t stop reading just because the author is an Office expert; it’s applicable across all aspects of IT. A truly excellent article that should be required reading for new admins.
An easy technique to take harsh text output and turn it into fluffy HTML. Perfect for punching up reporting to show your manager with zero extra effort, leaving more time for you to work on real issues. Or, you know, see your children grow up. Cat’s in the cradle and the silvaaaaah spoooon…
Yes please! If you have a friend that admins SharePoint, share this with them. In fact, bribe them to follow it. Whatever it takes. NTLM is the Devil and SharePoint feeds him a jalapenos.
The Daily Mail was granted a “rare and remarkable” interview with Bill Gates last week. It’s a very interesting read.
Remember when I said yesterday that it sucks to use the Internet in Australia and Canada? Well it sucks in other places too… The article isn’t what I’d call “complete” (it misses 98% of the world and doesn’t include my gigantic US ISP, Time Warner, for example – TW doesn’t care if I download 5 TB or 5KB, as fast and as often as I like, as long as I pay on time; I use Sprint for my phone for the very same reason – flat rate unlimited data without metering rules). A nifty piece – I recommend the comments.
Hi guys, Joji Oshima here again. Today I want to talk about configuring Kerberos authentication to work in a load-balanced environment. This is a more advanced topic that requires a basic understanding of how Kerberos works. If you want an overview of Kerberos, I would suggest Rob’s excellent post, Kerberos for the Busy Admin. In this post, I will be using a load balanced IIS web farm as the example, but the principal applies for other applications.
If you are using 2008 Server or higher, you can view this attribute using Active Directory Users and Computers (ADUC) with Advanced Features enabled and going to the Attribute Editor tab. Click the View menu and then select Advanced Features
You can also view attributes using ADSI Edit (adsiedit.msc) or the Setspn command line tool.
When an application makes a request for a Kerberos ticket, it makes that request for a specific SPN (like http/server01.contoso.com). The Key Distribution Center (KDC) will search Active Directory for the object that has that principal name registered to it, and encrypt the ticket with that object’s password. The object that is running the service has the same password, so when the ticket arrives, it can decrypt it.
The Problem:
If you have a single IIS server, the service is typically running under Local System. The standard SPNs are registered to the computer account (like host/server01 & host/server01.contoso.com)* so when a request for http/server01 comes in, the ticket will be encrypted using the computer account’s password. This configuration works well for a single server environment.
*The host SPN works for many services including http. If there is a specific entry for http, it will use that, otherwise it will fallback and use host.
In a load-balanced environment, users will access the service using a unified name instead of the individual servers. Therefore, instead of accessing server01.contoso.com or server02.contoso.com, they will access myapplication.contoso.com. In this scenario, there are two computer accounts, so where do you register the Service Principal Name? One idea would be to register the principal name to both computer accounts. The problem with this idea is that Service Principal Names must be unique. When the request comes in for http/myapplication.contoso.com, the KDC would not know which object’s password to encrypt the ticket with, so it will return an error. You will also see Event ID 11 populating your event logs if you have any duplicate SPNs in your directory.
The Solution:
Instead of running the service under Local System, have each server run the application using a specific service account. In IIS, you can accomplish this by having the application pool run under the service account. Here is how you would set it in IIS7. You could also follow the instructions in this TechNet article.
1. Open Internet Information Services (IIS Manager) and browse to the Application Pools page
4. This will bring up a dialog box where you can choose what credentials to use for the application pool. Choose Custom account and click the set button.
5. Enter the credentials for your service account and click ok
6. If you are using IIS7 and have Kernel Mode Authentication set, you will need to do one additional step. Open the ApplicationHost.config file and enable the useAppPoolCredentials setting. IIS7 added the option to authenticate users in Kernel mode to speed up the authentication process. By default, it will use the computer account for authentication requests even if the application pool is set to a service account. By changing this setting, you get the benefits of Kernel Mode Authentication, but still authenticate with the service account.
After you have the services running under the same service account, register the unified name, http/myapplication.contoso.com, to that service account. No matter which server the client is routed to, the service will be able to decrypt the ticket using its password. You can register SPNs using the command line tool Setspn.
Suppose you want the ability to use Kerberos authentication accessing the servers individually and using the unified name. Currently, if you request a ticket for http/server01.contoso.com, the KDC will encrypt the ticket using server01’s computer object password. The service is not running under local system, so it will not be able to decrypt that ticket. However, you can register additional SPNs to the service account. In this scenario, you could register the following SPNs to the service account.
You can also view the attributes currently registered to an account using the Setspn command line tool. Syntax: Setspn –l accountname
This will not interfere with the host SPNs registered to the computer account. When an incoming request comes for http/server01, it will check for the exact string first. If it cannot find it, it will look for host/server01.
Problems:
Remember that a Service Principal Name can only be registered on one account at a time. If you are using 2008 Server or higher, you can search for duplicate SPNs in your environment by using the command: Setxpn -f -q http/myapplication*
You can also use the command line tool LDIFDE to find duplicate SPNs. The command below will output a text file called SPN.txt that contains all objects with a service principal name that starts with http/myapplication. This file will be located in the same directory you run the command in unless you specify a path in the –f switch.
The –r switch determines the search criteria, and the * at the end is a wildcard
The –l switch chooses which attributes will be listed in the output file
Final Thoughts:
There are many benefits to using Kerberos authentication but configuring it properly may feel like a daunting task, especially in a more complex environment. I hope this post makes configuring this a bit easier.
Hello again, this is guest author Herbert from Germany.
It’s harder to let go of old components and protocols than dropping old habits. But, I’m falling back to an old habit myself…there goes the New Year resolution.
Quite recently we were faced with a new aspect of an old story. We hoped this problem would cease to exist as customers move forward with Kerberos-based solutions and other methods that facilitate Kerberos, such as smartcard PKINIT.
Yes, there are still some areas where we have to use NTLM for the sake of compatibility or absence of a domain controller. One of the most popular scenarios is disconnected clients using RPC over HTTP to connect to an Exchange mailbox. Another one is web proxy servers - which still often use NTLM although they and most browsers support Kerberos also.
With RPC over HTTP you have two discrete NTLM authentications: the outer HTTP session is authenticated on the frontend server and the inner RPC authentication is done on the mailbox server. The NTLM load from proxy servers can be even worse - as each TCP session has to be authenticated - and some browsers frequently recycle their sessions.
One way or the other, you end up with a high rate of NTLM authentication requests. And you may have already found the “MaxConcurrentAPI“ parameter, which is the number of concurrent NTLM authentications processed by the server. Historically there has been constant talk about a default of 2. However, the defaults are quite different:
Member-Workstation: 1
Member-Server: 2
Domain Controller: 1
The limit applies per Secure Channel. Members can only have one secure channel to a DC in the domain of which they are a member. Domain Controllers have one Secure Channel per trusted domain. However, as many customers follow a functional domain model of “user domains” and “resource domains”, the list of domains actually used for authentication is low and thus DCs are limited to 1 concurrent authentication for a certain “user domain”. Check out this diagram:
In this diagram, you see authentication requests started against servers in the left-hand forest as colored boxes by users in the right-hand forest. We are using the default values of MaxConcurrentAPI. The requests are forwarded along the trust paths to the right-hand forest. The trust paths used are shown by the arrows.
Now you see that on each resource forest DC up to 2 requests from member resource servers are queued. On the downstream DC, you get a maximum of 1 request from the grand-child domain. The same applies to the forest root DC. In this case, the only active authentication call for forest 1 is for the forest 2 grand-child domain, shown with brown API slots and arrows. Now that’s a real convoy…
The hottest link is between the forest root domains as every NTLM request needs to travel through the secure channels of forest1 root DCs with forest2 root DCs.
From the articles you may know “MaxConcurrentAPI” can be increased to 10 with a registry change. Well, Windows Server 2008 and Windows Server 2008 R2 have an update which pushes the limit to 150:
975363 A time-out error occurs when many NTLM authentication requests are sent from a computer that is running Windows Server 2008 R2, Windows 7, Windows Server 2008, or Windows Vista in a high latency network
This should be of some help… In addition, Windows Server 2008 and later include a performance object called ”Netlogon” which allows you monitoring the throughput, load and duration of NTLM authentication requests. You can add that to Windows Server 2003 using an update:
928576 New performance counters for Windows Server 2003 let you monitor the performance of Netlogon authentication
The article also offers a description of the counters. When you track the performance object you notice each secure channel is visible as a separate instance. This allows you to track activity per domain, what DCs are used and whether there are frequent fail-overs.
Beyond the article, these are our recommendations regarding performance baselines and alerts:
Performance counter
Recommendation
Semaphore Waiters
All Semaphores are busy, we have threads and thus logons waiting in the queue. This counter is a candidate for a warning.
Semaphore Holders
This is the number of currently active callers. This is a candidate for a baseline to monitor. If this is approaching your maximum setting in baselines, you need to act.
Semaphore Acquires
This counts the total # of requests over this secure channel. When the secure channel fails and is reestablished, the count restarts from 0. Check the _Total instance for a counter for the whole server. Good to monitor the trend in baselines.
Semaphore Timeouts
An authentication thread has hit the time-out for the waiting and the logon was denied. So the logon was slow, and then it failed. This is a very bad user experience and the secure channel is overloaded, hung or broken. Also check the _Total instance.
This is ALERT material.
Average Semaphore Hold Time
This should provide the average response time quite nicely. This is also a candidate for baseline monitoring for trends.
When it comes to discussing secure channels and maximum concurrency and queue depth, you also have to talk about how the requests are routed. Within a forest, you notice that the requests are sent directly to the target user domain.
When Netlogon finds that the user account is from another forest, it however has to follow the trust path, similar to what a Kerberos client would do (just the opposite direction). So the requests are forwarded to the parent domain and eventually arrive at the forest root DCs and from there across the forest boundary. You can easily imagine the Netlogon Service queues and context items look like rush hour at the Frankfurt airport.
So who cares?
You might say that besides the domains becoming bigger nowadays, there’s not a lot of news for folks running Exchange or big proxy server farms. Well, recently we became aware of a new source of NTLM authentication requests that was in the system for quite some time, but that now has reared its head. Recently customers have decided to turn this on, perhaps due to recommendations in a few of our best practices guides. We’re currently working on having these updated.
RPC Interface Restriction was introduced in Windows XP Service Pack 2 and Windows Server 2003 Service Pack 1 and offers the options to force authentication for all RPC Endpoint Mapper requests. The goal was to prevent anonymous attacks on the service. The goal may also have been avoiding denial of service attacks, but that one did not pan out very well. The details are described here:
In this description, the facility is hard-coded to use NTLM authentication. Starting with Windows 7, the feature can also use Kerberos for authentication. So this is yet another reason to update.
The server will only require authentication (reject anonymous clients) if “RestrictRemoteClients” is set to 1 or higher. When you have the combinations of applications with dynamic endpoints, many clients and frequent reconnects in the deployments, you get a sustainable number of authentications.
Some of the customers affected were quite surprised about the NTLM authentication volume, as they had everything configured to use Kerberos on their proxy servers and Exchange running without RPC over HTTP clients.
Exchange with MAPI clients is an application architecture that uses many different RPC interfaces, all using Endpoint Mapper. The list includes Store, NSPI, Referrer plus a few operating system interfaces like LSA RPC, each one of them triggering NTLM authentications. The bottleneck is then caused by the queuing of requests, done in each hop along the trust path.
Similar problems may happen with custom applications using RPC or DCOM to communicate. It all comes down to the rate of NTLM authentications induced on the AD infrastructure.
In our testing we found that not all RPC interfaces are happy with secure endpoint mapper, see the blog of Ned.
What are customers doing about it?
Most customers are then going to increase “MaxConcurrentAPI” which provides relief. Many customers also add monitoring of Netlogon performance counters to their baseline. We also have customers who start to use secure channel monitoring, and when they see that a DC is heaping incoming secure channels, they use “nltest /SC_RESET” to balance resource domain controllers or member servers evenly across the downstream domain controllers.
And yes, one way out of this is also setting the RPC registry entries or group policy to the defaults, so clients don’t attempt NTLM authentication. Since this setting was often required by the security department, it is probably not being changed in all cases. Some arguments that the secure Endpoint Mapper may not provide significant value are as follows:
1. The call is only done to get the server TCP port. The communication to the server typically is authenticated separately.
2. If the firewall does not permit incoming RPC endpoint mapper request from the Internet, the callers are all from the internal network. Thus no information is disclosed to outside entities if the network is secure.
3. There are no known vulnerabilities in the endpoint mapper. It was once justified when there were vulnerabilities, but not today.
4. If you can’t get the security policy changed, ask the IT team to expedite Windows 7 deployment as it does not cause NTLM authentication in this scenario.
Ah, those old habits, they always come back on you. The hope you now have tools and countermeasures to make all this more bearable.
Update 5/1/2012:
There is an update available that adds NetLogon events 5816-5819 when you experience Semaphore Waiters and Semaphore Timeouts. This will allow you to find bottlenecks in your trust graph quickly. Check out:
We plan to migrate our Certificate Authority from single-tier online Enterprise Root to two-tier PKI. We have an existing smart card infrastructure. TechNet docs don’t really speak to this scenario in much detail.
1. Does migration to a 2-tier CA structure require any customization?
2. Can I keep the old CA?
3. Can I create a new subordinate CA under the existing CA and take the existing CA offline?
While you can migrate an online Enterprise Root CA to an offline Standalone Root CA, that probably isn't the best decision in this case with regard to security. Your current CA has issued all of your smart card logon certificates, which may have been fine when that was all you needed, but it certainly doesn't comply with best practices for a secure PKI. The root CA of any PKI should be long-lived (20 years, for example) and should only issue certificates to subordinate CAs. In a 2-tier hierarchy, the second tier of CAs should have much shorter validity periods (5 years) and is responsible for issuing certificates to end entities. In your case, I'd strong consider setting up a new PKI and migrating your organization over to it. It is more work at the outset, but it is a better decision long term.
You can keep the currently issued certificates working by publishing a final, long-lived CRL from the old CA. This is covered in the first blog post above. This would allow you to slowly migrate your users to smart card logon certificates issued by the new PKI as the old certificates expired. You would also need to continue to publish the old root CA certificate in the AD and in the Enterprise NTAuth store. You can see these stores using the Enterprise PKI snap-in: right-click on Enterprise PKI and select Manage AD Containers. The old root CA certificate should be listed in the NTAuthCertificates tab, and in the Certificate Authorities Container tab. Uninstalling the old CA will remove these certificates; you'll need to add them back.
You can't take an Enterprise CA offline. An Enterprise CA requires access to Active Directory in order to function. You can migrate an Enterprise CA to a Standalone CA and take that offline, but, as I've said before, that really isn't the best option in this case.
Question
Are there any know issues with P2Ving ADAM/AD LDS servers?
Answer
[Provided by Kim Nichols, our resident ADLDS guru'ette - Editor]
No problems as far as we know. The same rules apply as P2V’ing DCs or other roles; make sure you clean up old drivers and decommission the physicals as soon as you are reasonably confident the virtual is working. Never let them run simultaneously. All the “I should have had a V-8” stuff.
Considering how simple it is to create an ADLDS replica, it might be faster and "cleaner" to create a new virtual machine, install and replicate ADLDS to it, then rename the guest and throw away the old physical; if ADLDS was its only role, naturally.
Question
[Provided by Fabian Müller, schlau Deutsche PFE- Editor]
When using production delegation in AGPM, we can grant permissions for editing group policy objects in the production environment. But these permissions will be written to all deployed GPOs, not for specific ones. GPMC makes it easy to set “READ” and “APPLY” permissions on a GPO, but I cannot find a security filtering switch in AGPM. So how can we manage the security filtering on group policies without setting the same ACL on all deployed policies?
Answer
Ok, granting “READ” and “APPLY” permissions respectively managing security filtering in AGPM is not that obvious to find. Do it like this in the change control panel of AGPM:
Check-out the according Group Policy Object and provide a brief overview of the changes to be done in the “comments” window, e.g. “Add important security filtering ACLs for group XYZ, dude!”
Edit the checked-out GPO
In the top of the Group Policy Management Editor, click “Action” –> “Properties”:
Note 1: Be aware that you won’t find any information regarding the security filtering change in the AGPM history of the edited group policy object. There is nothing in the HTML reports that refer to security filtering changes. That’s why you should provide a good explanation on your changes during “check-in” and “check-out” phase:
I have one Windows Server 2003 IIS machine with two web applications, each in its own application pool. How can I register SPNs for each application?
Answer
[This one courtesy of Rob Greene, the Abominable Authman - Editor]
There are a couple of options for you here.
You could address each web site on the same server with different host names. Then you can add the specific HTTP SPN to each application pool account as needed.
You could address each web site with a unique port assignment on the web server. Then you can add the specific HTTP SPN with the port attached like http/myweb.contoso.com:88
You could use the same account to run all the application pool accounts on the same web server.
NOTE: If you choose option 1 or 2, you have to be careful about Internet Explorer behaviors. If you choose the unique host name per web site then you will need to make sure to use HOST records in DNS or put a registry key in place on all workstations if you choose CNAME. If you choose having a unique port for each web site, you will need to put a registry key in place on all workstations so that they send the port number in the TGS SPN request.
Comparing AGPM controlled GPOs within the same domain is no problem at all – but if the AGPM server serves more than one domain, how can I compare GPOs that are hosted in different domains using AGPM difference report?
Answer
[Again from Fabian, who was really on a roll last week - Editor]
Since AGPM 4.0 we provide the ability to export and import Group Policy Objects using AGPM. What you have to do is:
… and import the *.cab to domain 2 using the AGPM GPO import wizard (right-click on an empty area in AGPM Contents—> Controlled tab and select “New Controlled GPO…”):
When I use the Windows 7 (RSAT) version of AD Users and Computers to connect to certain domains, I get error "unknown user name or bad password". However, when I use the XP/2003 adminpak version, no errors for the same domain. There's no way to enter a domain or password.
Answer
ADUC in Vista/2008/7/R2 does some group membership and privilege checking when it starts that the older ADUC never did. You’ll get the logon failure message for any domain you are not a domain admin in, for example. The legacy ADUC is probably broken for that account as well – it’s just not telling you.
I have 2 servers replicating with DFSR, and the network cable between them is disconnected. I delete a file on Server1, while the equivalent file on Server2 is modified. When the cable is re-connected, what is the expected behavior?
Answer
Last updater wins, even if a modification of an ostensibly deleted file. If the file was deleted first on server 1 and modified later on server 2, it would replicate back to server 1 with the modifications once the network reconnected. If it had been deleted later than the modification, that “last write” would win and it would delete from the other server once the network resumed.
Is there any automatic way to delete stale user or computer accounts? Something you turn on in AD?
Answer
Nope, not automatically; you have to create a solution to detect the age and disable or delete stale accounts. This is a very dangerous operation - make sure you understand what you are getting yourself into. For example:
Whenever I try to use the PowerShell cmdlet Get-ACL against an object in AD, always get an error like " Cannot find path ou=xxx,dc=xxx,dc=xxx because it does not exist". But it does!
Answer
After you import the ActiveDirectory module, but before you run your commands, run:
CD AD:
Get-Acl won’t work until you change to the magical “active directory drive”.
Question
I've read the Performance Tuning Guidelines for Windows Server, and I wonder if all SMB server tuning parameters (AsyncCredits, MinCredits, MaxCredits, etc) also work (or help) for DFSR. Also, do you know the limit is for SMB Asynchronous Credits - the document doesn’t say?
Answer
Nope, they won’t have any effect on DFSR – it does not use SMB to replicate files. SMB is only used by the DFSMGMT.MSC if you ask it to create a replicated folder on another server during RF setup. More info here:
That AsynchronousCredits SMB value does not have a true maximum, other than the fact that it is a DWORD and cannot exceed 4,294,967,295 (i.e. 0xffffffff). Its default value on Windows Server 2008 and 2008 R2 is 512; on Vista/7, it's 64.
HOWEVER!
As KB938475 (http://support.microsoft.com/kb/938475) points out, adjusting these defaults comes at the cost of paged pool (Kernel) memory. If you were to increase these values too high, you would eventually run out of paged pool and then perhaps hang or crash your file servers. So don't go crazy here.
There is no "right" value to set - it depends on your installed memory, if you are using 32-bit versus 64-bit (if 32-bit, I would not touch this value at all), the number of clients you have connecting, their usage patterns, etc. I recommend increasing this in small doses and testing the performance - for example, doubling it to 1024 would be a fairly prudent test to start.
Other Stuff
Happy Birthday to all US Marines out there, past and present. I hope you're using Veterans Day to sleep off the hangover. I always assumed that's why they made it November 11th, not that whole WW1 thing.
Also, happy anniversary to Jonathan, who has been a Microsoft employee for 15 years. In keeping with the tradition, he had 15 pounds of M&Ms for the floor, which in case you’re wondering, it fills a salad bowl. Which around here, means:
A great baseball story about Lou Gehrig, Kurt Russell, and a historic bat.
Off to play some Battlefield 3. No wait, Skyrim. Ah crap, I mean Call of Duty MW3. And I need to hurry up as Arkham City is coming. It's a good time to be a PC gamer. Or Xbox, if you're into that sorta thing.
Have a nice weekend folks,
- Ned "and Jonathan and Kim and Fabian and Rob" Pyle
Is there an "official" stance on removing built-in admin shares (C$, ADMIN$, etc.) in Windows? I’m not sure this would make things more secure or not. Larry Osterman wrote a nice article on its origins but doesn’t give any advice.
Answer
The official stance is from the KB that states how to do it:
Generally, Microsoft recommends that you do not modify these special shared resources.
Even better, here are many things that will break if you do this:
That’s not a complete list; it wasn’t updated for Vista/2008 and later. It’s so bad though that there’s no point, frankly. Removing these shares does not increase security, as only administrators can use those shares and you cannot prevent administrators from putting them back or creating equivalent custom shares.
This is one of those “don’t do it just because you can” customizations.
Question
The Windows PowerShell Get-ADDomainController cmdlet finds DCs, but not much actual attribute data from them. The examples on TechNet are not great. How do I get it to return useful info?
Answer
You have to use another cmdlet in tandem, without pipelining: Get-ADComputer. The Get-ADDomainController cmdlet is good mainly for searching. The Get-ADComputer cmdlet, on the other hand, does not accept pipeline input from Get-ADDomainController. Instead, you use a pseudo “nested function” to first find the PDC, then get data about that DC. For example, (this is all one command, wrapped):
When you run this, PowerShell first processes the commands within the parentheses, which finds the PDC. Then it runs get-adcomputer, using the property of “Name” returned by get-addomaincontroller. Then it passes the results through the pipeline to be formatted. So it’s 123.
Moreover, before the Internet clubs me like a baby seal: yes, a more efficient way to return data is to ensure that the –property list contains only those attributes desired:
The Get-ADDomain cmdlet can also find FSMO role holders and other big picture domain stuff. For example, the RID Master you need to monitor.
Question
I know about Kerberos “token bloat” with user accounts that are a member of too many groups. Does this also affect computers added to too many groups? What would some practical effects of that? We want to use a lot of them in the near future for some application … stuff.
Answer
Yes, things will break. To demonstrate, I use PowerShell to create 2000 groups in my domain and added a computer named “7-01” to them:
I then restart the 7-01 computer. Uh oh, the System Event log is un-pleased. At this point, 7-01 is no longer applying computer group policy, getting start scripts, or allowing any of its services to logon remotely to DCs:
I’m sure no one will go on a wild goose chase after seeing that message. Applications will be freaking out even more, likely with the oh-so-helpful error 0x80090350:
“The system detected a possible attempt to compromise security. Please ensure that you can contact the server that authenticated you.”
Don’t do it. MaxTokenSize is probably in your future if you do, and it has limits that you cannot design your way out of. IT uniqueness is bad.
Question
We have XP systems using two partitions (C: and D:) migrating to Windows 7 with USMT. The OS are on C and the user profiles on D. We’ll use that D partition to hold the USMT store. After migration, we’ll remove the second partition and expand the first partition to use the space freed up by the first partition.
When restoring via loadstate, will the user profiles end up on C or on D? If the profiles end up on D, we will not be able to delete the second partition obviously, and we want to stop doing that regardless.
Answer
You don’t have to do anything; it just works. Because the new profile destination is on C, USMT just slots everything in there automagically :). The profiles will be on C and nothing will be on D except the store itself and any non-profile folders*:
If users have any non-profile folders on D, that will require a custom rerouting xml to ensure they are moved to C during loadstate and not obliterated when D is deleted later. Or just add a MOVE line to whatever DISKPART script you are using to expand the partition.
Question
Should we stop the DFSR service before performing a backup or restore?
Answer
Manually stopping the DFSR service is not recommended. When backing up using the DFSR VSS Writer – which is the only supported way – replication is stopped automatically, so there’s no reason to stop the service or need to manually change replication:
Event ID=1102 Severity=Informational The DFS Replication service has temporarily stopped replication because another application is performing a backup or restore operation. Replication will resume after the backup or restore operation has finished.
Event ID=1104 Severity=Informational The DFS Replication service successfully restarted replication after a backup or restore operation.
Another bit of implied evidence – Windows Server Backup does not stop the service.
Stopping the DFSR service for extended periods leaves you open to the risk of a USN journal wrap. And what if someone/something thinks that the service being stopped is “bad” and starts it up in the middle of the backup? Probably nothing bad happens, but certainly nothing good. Why risk it?
Question
In an environment where AGMP controls all GPOs, what is the best practice when application setup routines make edits "under the hood" to GPOs, such as the Default Domain Controllers GPO? For example, Exchange setup make changes to User Rights Assignment (SeSecurityPrivilege). Obviously if this setup process makes such edits on the live GPO in sysvol the changes will happen, but then only to have those critical edits be lost and overwritten the next time an admin re-deploys with AGPM.
Answer
[via Fabian “Wunderbar” Müller – Ned]
From my point of view:
1. The Default Domain and Default Domain Controller Policies should be edited very rarely. Manual changes as well as automated changes (e.g. by the mentioned Exchange setup) should be well known and therefore the workaround in 2) should be feasible.
2. After those planned changes were performed, you have to use “import from production” the production GPO to the AGPM archive in order to reflect the production change to AGPM. Another way could be to periodically use “import from production” the default policies or to implement a manual / human process that defines the “import from production” procedure before a change in these policies is done using AGPM.
Not a perfect answer, but manageable.
Question
In testing the rerouting of folders, I took the this example from TechNet and placed in a separate custom.xml. When using this custom.xml along with the other defaults (migdocs.xml and migapp.xml unchanged), the EngineeringDrafts folder is copied to %CSIDL_DESKTOP%\EngineeringDrafts' but there’s also a copy at C:\EngineeringDrafts on the destination computer.
I assume this is not expected behavior. Is there something I’m missing?
If you have an <include> rule in one component and a <locationModify> rule in another component for the same file, the file will be migrated in both places. That is, it will be included based on the <include> rule and it will be migrated based on the <locationModify> rule
That original rerouting article could state this more plainly, I think. Hardly anyone does this relativemove operation; it’s very expensive for disk space– one of those “you can, but you shouldn’t” capabilities of USMT. The first example also has an invalid character in it (the apostrophe in “user’s” on line 12, position 91 – argh!).
Don’t just comment out those areas in migdocs though; you are then turning off most of the data migration. Instead, create a copy of the migdocs.xml and modify it to include your rerouting exceptions, then use that as your custom XML and stop including the factory migdocs.xml.
There’s an example attached to this blog post down at the bottom. Note the exclude in the System context and the include/modify in the user context:
Don’t just modify the existing migdocs.xml and keep using it un-renamed either; that becomes a versioning nightmare down the road.
Question
I'm reading up on CAPolicy.inf files, and it looks like there is an error in the documentation that keeps being copied around. TechNet lists RenewalValidityPeriod=Years and RenewalValidityPeriodUnits=20 under the "Windows Server 2003" sample. This is the opposite of the Windows 2000 sample, and intuitively the "PeriodUnits" should be something like "Years" or "Weeks", while the "Period" would be an integer value. I see this on AskDS here and here also.
Answer
[via Jonathan “scissor fingers” Stephens – Ned]
You're right that the two settings seem like they should be reversed, but unfortunately this is not correct. All of the *Period values can be set to Minutes, Hours, Days, Weeks, Months or Years, while all of the *PeriodUnits values should be set to some integer.
Originally, the two types of values were intended to be exactly what one intuitively believes they should be -- *PeriodUnits was to be Day, Weeks, Months, etc. while *Period was to be the integer value. Unfortunately, the two were mixed up early in the development cycle for Windows 2000 and, once the error was discovered, it was really too late to fix what is ultimately a cosmetic problem. We just decided to document the correct values for each setting. So in actuality, it is the Windows 2000 documentation that is incorrect as it was written using the original specs and did not take the switch into account. I’ll get that fixed.
Question
Is there a way to control the number, verbosity, or contents of the DFSR cluster debug logs (DfsrClus_nnnnn.log and DfsrClus_nnnnn.log.gz in %windir%\debug)?
Answer
Nope, sorry. It’s all static defined:
Severity = 5
Max log messages per log = 10000
Max number of log files = 999
Question
In your previous article you say that any registry modifications should be completed with resource restart (take resource offline and bring it back online), instead of direct service restart. However, official whitepaper (on page 16) says that CA service should be restarted by using "net stop certsvc && net start certsvc".
Also, I want to clarify about a clustered CA database backup/restore. Say, a DB was damaged or destroyed. I have a full backup of CA DB. Before restoring, I do I stop only AD CS service resource (cluadmin.msc) or stop the CA service directly (net stop certsvc)?
Answer
[via Rob “there's a Squatch in These Woods” Greene – Ned]
The CertSvc service has no idea that it belongs to a cluster. That’s why you setup the CA as a generic service within Cluster Administration and configure the CA registry hive within Cluster Administrator.
When you update the registry keys on the Active CA Cluster node, the Cluster service is monitoring the registry key changes. When the resource is taken offline the Cluster Service makes a new copy of the registry keys to so that the other node gets the update. When you stop and start the CA service the cluster services has no idea why the service is stopped and started, since it is being done outside of cluster and those registry key settings are never updated on the stand-by node. General guidance around clusters is to manage the resource state (Stop/Start) within Cluster Administrator and do not do this through Services.msc, NET STOP, SC, etc.
As far as the CA Database restore: just logon to the Active CA node and run the certutil or CA MMC to perform the operation. There’s no need to touch the service manually.
Other stuff
The Microsoft Premier Field Organization has started a new blog that you should definitely be reading.
Hi folks, Ned here again. I know this is supposed to be the Friday Mail Sack but things got a little hectic and... ah heck, it doesn't need explaining, you're in IT. This week - with help from the ever-crotchety Jonathan Stephens - we talk about:
Now that Jonathan's Rascal Scooter has finished charging, on to the Q & A.
Question
We want to create a group policy for an OU that contains various computers needs to run for just Windows 7 notebooks only. All of our notebooks are named starting with an "N". Does group policy WMI filtering allows stacking conditions on the same group policy?
Answer
Yes, you can chain together multiple query criteria, and they can even be from different classes or namespaces. For example, here I use both the Win32_OperatingSystem and Win32_ComputerSystem classes:
As long as they all evaluate TRUE, you get the policy. If you had a hundred of these criteria (please don’t) and 99 evaluate true but just one is false, the policy is skipped.
Note that my examples above would catch Win2008 R2 servers also; if you’ve read my previous posts, you know that you can also limit queries to client operating systems using the Win32_OperatingSystem property OperatingSystemSKU. Moreover, if you hadn’t used a predictable naming convention, you can also filter on with Win32_SystemEnclosure and query the ChassisTypes property for 8, 9, or 10 (respectively: “Portable”, “Laptop”, and “Notebook”). And no, I do not know the difference between these, it is OEM-specific. Just like “pizza box” is for servers. You stay classy, WMI.
MaxPoolThreads controls the maximum number of simultaneous threads per-processor that a DC uses to work on LDAP requests. By default, it’s four per processor core. Increasing this value would allow a DC/GC to handle more LDAP requests. So if you have too many LDAP clients talking to too few DCs at once, raising this can reduce LDAP application timeouts and periodic “hangs”. As you might have guessed, the biggest complainer here is often MS Exchange and Outlook. If the performance counters “ATQ Threads LDAP" & "ATQ Threads Total" are constantly at the maximum number based on the number of processor and MaxPoolThreads value, then you are bottlenecking LDAP.
However!
DCs are already optimized to quickly return data from LDAP requests. If your hardware is even vaguely new and if you are not seeing actual issues, you should not increase this default value. MaxPoolThreads depends on non-paged pool memory, which on a Win2003 32-bit Windows OS is limited to 256MB (more on Win2008 32-bit). Meaning that if you still have not moved to at least x64 Windows Server 2003, don’t touch this value at all – you can easily hang your DCs. It also means you need to get with the times; we stopped making a 32-bit server OS nearly three years ago and OEMS stopped selling the hardware even before that. A 64-bit system's non-paged pool limit is 128GB.
In addition, changing the LDAP settings is often a Band-Aid that doesn’t address the real issue of DC capacity for your client/server base. Use SPA or AD Data Collector sets to determine "Clients with the Most CPU Usage" under section "Ldap Requests”. Especially if the LDAP queries are not just frequent but also gross - there are also built-in diagnostics logs to find poorly-written requests:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NTDS\Diagnostics\ 15 Field Engineering
To categorize search operations as expensive or inefficient, two DWORD registry keys are used:
(The title should be altered to “Creating even slightly efficient…” in my experience).
Question
I want to implement many-to-one certificate mappings by using Issuer and Subject DN match. In altSecurityIdentities I put the following string:
X509:<I>DC=com,DC=contoso,CN=Contoso CA<S>DC=com,DC=contoso,CN=users,CN=user name
In a given example, a certificate with “cn=user name, cn=users, dc=contoso, dc=com” in the Subject field will be mapped to a user account, where I define the mappings. But in that example I get one-to-one mapping. Can I use wildcards here, say:
So that any certificate that contains “cn=<any value>, cn=users, dc=contoso, dc=com” will be mapped to the same user account?
Answer
[Sent from Jonathan while standing in the 4PM dinner line at Bob Evans]
Unfortunately, no. All that would do is map a certificate with a wildcard subject to that account. The only type of one-to-many mapping supported by the Active Directory mapper is configuring it to ignore the subject completely. Using this method, you can configure the AD mappings so that any certificate issued by a particular CA can be mapped to a single user account. See the following: http://technet.microsoft.com/en-us/library/bb742438.aspx#ECAA
Question
I've recently been working on extending my AD schema with a new back-linked attribute pair, and I used the instructions on this blog and MSDN to auto-generate the linkIDs for my new attributes. Confusingly, the resulting linkIDs are negative values (-912314983 and -912314984). The attributes and backlinks seem to work as expected, but when looking at the MSDN definition of the linkID attribute, it specifically states that the linkID should be a positive value. Do you know why I'm getting a negative value, and if I should be concerned?
Answer
[Sent from Jonathan’s favorite park bench where he feeds the pigeons]
The negative numbers are correct and expected, and are the result of a feature called AutoLinkID. Automatically generated linkIDs are in the range of 0xC0000000-0xFFFFFFFC (-1,073,741,824 to -4). This means that it is a good idea to use positive numbers if you are going to set the linkID manually. That way you are guaranteed not to conflict with automatically generated linkIDs.
The bottom line is, this is expected under the circumstances and you're all good.
Question
Is there any performance advantage to turning off the DFSR debug logging, lowering the number of logs, or moving the logs to another drive? You explained how to do this here in the DFSR debug series, but never mentioned it in your DFSR performance tuning article.
Answer
Yes, you will see some performance improvements turning off the logging or lowering the log count; naturally, all this logging isn’t free, it takes CPU and disk time. But before you run off to make changes, remember that if there are any problems, these logs are the only thing standing between you and the unemployment line. Your server will be much faster without any anti-virus software too, and your company’s profits higher without fire insurance; there are trade-offs in life. That’s why – after some brief agonizing, followed by heavy drinking – I decided not to include it in the performance article.
Moving the logs to another physical disk than Windows is safe and may take some pressure of the OS drive.
Question
When I try to join this Win2008 R2 computer to the domain, it gives an error I’ve never seen before:
"The following error occurred attempting to join the domain "contoso.com": The request is not supported."
Answer
This server was once a domain controller. During demotion, something prevented the removal of the following registry value name:
Delete that "Dsa Database File" value name and attempt to join the domain again. It should work this time. If you take a gander at the %systemroot%\debug\netsetup.log, you’ll see another clue that this is your issue:
NetpIsTargetImageADC: Determined this is a DC image as RegQueryValueExW loaded Services\NTDS\Parameters\DSA Database file: 0x0 NetpInitiateOfflineJoin: The image at C:\Windows\system32\config\SYSTEM is a DC: 0x32
We started performing this check in Windows Server 2008 R2, as part of the offline domain join code changes. Hurray for unintended consequences!
Question
We have a largish AD LDS (ADAM) instance we update daily through by importing CSV files that deletes all of yesterday’s user objects and import today’s. Since we don’t care about deleted objects, we reduced the tombstoneLifetime to 3 days. The NTDS.DIT usage, as shown by the 1646 Garbage Collection Event ID, shows 1336mb free with a total allocation of 1550mb – this would suggest that there is a total of 214MB of data in the database.
The problem is that Task Manager shows a total of 1,341,208K of Memory (Private Working Set) in use. The memory usage is reduced to around the 214MB size when LDS is restarted; however, when Garbage Collection runs the memory usage starts to climb. I have read many KB articles regarding GC but nothing explains what I am seeing here.
Answer
Generally speaking, LSASS (and DSAMAIN, it’s red-headed ADLDS cousin) is designed to allocate and retain more memory – especially ESE (aka “Jet”) cache memory – than ordinary processes, because LSASS/DSAMAIN are the core processes of a DC or AD/LDS server. I would expect memory usage to grow heavily during the import, the deletions, and then garbage collection; unless something else put pressure on the machine for memory, I’d expect the memory usage to remain. That’s how well-written Jet database applications work – they don’t give back the memory unless someone asks, because LSASS and Jet can reuse it much faster when needed if it’s already loaded; why return memory if no one wants it? That would be a performance bug unto itself.
The way to show this in practical terms is to start some other high-memory process and validate that DSAMAIN starts to return the demanded memory. There are test applications like this on the internet, or you can install some app that likes to gobble a lot of RAM. Sometimes I’ll just install Wireshark and load a really big saved network capture – that will do it in a pinch. :-D You can also use the ESE performance counters under the “Database” and “Database ==> Instances” to see more about how much of the memory usage is Jet database cache size.
Regular DCs have this behavior too, as does DFSR and do other applications. You paid for all that memory; you might as well use it.
(Follow up from the customer where he provided a useful PowerShell “memory gobbler” example)
I ran the following Windows PowerShell script a few times to consume all available memory and the DSAMAIN process started releasing memory immediately as expected:
Pinned and Recent jump lists are not migrated by USMT, because the built-in OS Shell32 manifest called by USMT (c:\windows\winsxs\manifests\*_microsoft-windows-shell32_31bf3856ad364e35_6.1.7601.17514_non_ca4f304d289b7800.manifest) contains this specific criterion:
Note how it is notRecent\* [*], which would grab the subfolder contents of Recent. It only copies the direct file contents of Recent. The pinned/automatic jump lists are stored in special files under the CustomDestinations and AutomaticDestinations folders inside the Recent folder. All the other contents of Recent are shortcut files to recently opened documents anywhere on the system:
Since these files are binary and embed all their data in a big blob of goo, they cannot simply be copied safely between operating systems using USMT. The paths they reference could easily change in the meantime, or the data they reference could have been intentionally skipped. The only way this would work is if the Shell team extended their shell migration plugin code to handle it. Which would be a fair amount of work, and at the time these manifests were being written, customers were not going to be migrating from Win7 to Win7. So no joy. You could always try copying them with custom XML, but I have no idea if it would work at all and you’re on your own anyway – it’s not supported.
Question
We have a third party application that requires DES encryption for Kerberos. It wasn’t working from our Windows 7 clients though, so we enabled the security group policy “Network security: Configure encryption types allowed for Kerberos” to allow DES. After that though, these Windows 7 clients stopped working in many other operations, with event log errors like:
Event ID: 4 Source: Kerberos Type: Error "The kerberos client received a KRB_AP_ERR_MODIFIED error from the server host/myserver.contoso.com. This indicates that the password used to encrypt the kerberos service ticket is different than that on the target server. Commonly, this is due to identically named machine accounts in the target realm (domain.com), and the client realm. Please contact your system administrator."
And “The target principal name is incorrect” or “The target account name is incorrect” errors connecting to network resources.
Answer
When you enable DES on Windows 7, you need to ensure you are not accidentally disabling the other cipher suites. So don’t do this:
If it’s set to 0x3, all heck will break loose. This security policy interface is admittedly tiresome in that it has no “enabled/disabled” toggle. Use GPRESULT /H or /Z to see how it’s applying if you’re not sure about the actual settings.
Other Stuff
Windows 8 Consumer Preview releases February 29th, as if you didn’t already know it. Don’t ask me if this also means Windows Server 8 Beta the same exact day, I can’t say. But it definitely means the last 16 months of my life finally start showing some results. As will this blog…
Apparently we’ve been wrong about Han and Greedo since day one. I want to be wrong though. Thanks for passing this along Tony. And speaking of which, thanks to Ted O and the rest of the gang at LucasArts for the awesome tee!