OpsMgr R2 By Example: the Active Directory Management Pack

OpsMg’s functionality is provided by management packs. While Microsoft provides documentation in the form of management pack guides for each management pack (MP), we believe there is benefit in providing a high-level overview of how to implement and tune those management packs. A while back, the OpsMgr blog ran a series of “OpsMgr by Example” postings covering some of the more well-known management packs. This posting denotes the return of an updated series covering a number of management packs available during the OpsMgr 2007 R2 timeframe.

Each of these postings will discuss steps taken to implement a particular MP, and examples of alerts and tuning steps. In doing so, our goal isto provide a 5000’ perspective plus show the details while tuning during a deployment. Those MPs which have tuning steps particular to OpsMgr 2007 R2 will be noted. We also provide thoughts for how many of the management packs could evolve. First to be covered – the Active Directory MP (ADMP).

The ADMP is available as a single download containing different libraries to monitor Active Directory 2000, 2003, and 2008 domain controllers.

Installing the ADMP

  1. Download the Active Directory Management Pack from the Management Pack Catalog (http://technet.microsoft.com/en-us/opsmgr/cc539535.aspx). The Active Directory Management Pack Guide is included in the download and labeled “OM2007_MP_AD2008.doc.” Beginning with the R2 release of Operations Manager, you can download management packs directly using the OpsMgr user interface (UI). It is suggested you actually download from the website and install it yourself so that you can have a copy of the management pack guide available during installation. If you already have a copy of the management pack guide, use the OpsMgr UI functionality to download and install the management pack.
  2. Read the Management Pack guide – cover to cover. This document spells out in detail some important pieces of information you will need to know.
  3. Import the AD Management Pack (using either the Operations console or PowerShell).
  4. Deploy the OpsMgr agent to all domain controllers (DCs). The agent must be deployed to all DCs. Agentless configurations will NOT work for the AD Management Pack.
  5. Get a list of all domain controllers from the Operations console. In the Authoring space, navigate to Authoring -> Groups -> AD Domain Controller Group (Windows 2008 Server). Right-click on the group(s) and select View Group Members.
  6. Enable Agent Proxy configuration on all Domain Controllers identified from the groups. This is in the Administration node, under Administration -> Device Management -> Agent Managed. Right-click each domain controller, select Properties, click the Security tab, and then check the box labeled Allow this agent to act as a proxy and discover managed objects on other computers. Perform this action for every domain controller, even if you add the DC after your initial configuration of OpsMgr. For a simple method to bulk-add the proxy setting, see http://www.systemcenterforum.org/news/opsmgr-enabling-agent-proxy-for-all-computers-hosting-an-instance-of-a-specific-object-class/ for details (thanks to Ziemek Boroski for his on the http://ops-mgr.spaces.live.com site for this).
  7. Configure the Replication account in the Operations console, under Administration -> Security (full details for this are in the AD MP Guide). Do this for every domain controller, even if you add the DC after your initial OpsMgr configuration.
  8. Validate the existence of the OpsMgrLatencyMonitors container (this was previously named the MOMLatencyMonitors container). Within this container, there should be sub-folders for each DC, using the name of each domain controller. If the container does not exist, it is often due to insufficient permissions. (See information configuring the Replication account within the AD MP Guide for details.)
  9. Open the Operations console. Go to the Monitoring node and navigate to Monitoring -> Microsoft Windows Active Directory -> Topology Views and validate functionality. (You may have to set the scope to the AD Domain Controllers Group to get these views to populate).
  10. Check to make sure Active Directory shows up under Monitoring -> Distributed Applications as a distributed application that is in the Healthy, Warning or Critical state. If it is in the “Not Monitored” state, check for domain controllers that are not installed or are in a “gray” state.
  11. Create a MicrosoftWindowsActiveDirectory_Overrides management pack to contain any overrides required for the MP (hey, if it’s not created now you’ll never remember to create it and you will end up using the default MP and that’s not good – see http://cameronfuller.spaces.live.com/blog/cns!A231E4EB0417CB76!1152.entry or System Center Operations Manager 2007 Unleashed [Sams, 2008] for details).
  12. The Active Directory Helper Object (known as oomads) needs to be installed on each domain controller that OpsMgr will monitor. This file (OOMADs.msi) is available on the OpsMgr R2 installation media in the HelperObjects folder, under the subfolder for the appropriate version of the operating system (amd64, i386, or ia64).

Deploying the Active Directory 2008 management pack was relatively painless. After importing the management pack, there was no significant impact on processors seen on the domain controllers. The Active Directory Topology Root appeared as a distributed application and showed a health state of green. The Active Directory diagram view also worked as expected.

Changes to the Run As Account in R2

The new Run As accounts in OpsMgr R2 for the Active Directory Management Pack have changed by adding the ability to define where you can target a Run As account to. The simplest (and most insecure) approach is to use the All targeted options, but this causes the Run As accounts to be deployed everywhere (including to remote forests where you should not attempt to use the account). The recommended approach is to create a Run As account for the AD MP Account Run As profile that specifies the domain controller’s computer objects as their target

Tuning / Alerts to look for in the Active Directory MP

The following alerts were encountered and resolved while tuning the various Active Directory management packs (these are listed in alphabetical order by Alert name):

Alert: (none)

Issue: The SysVol for Windows 2008 portion of the Management Pack for Active Directory Server 2008 (Monitoring) identified an alert as part of the DFS Service Health alert monitor for one of the domain controllers in our environment. No additional knowledge was available.

Resolution: It was determined the technician had uninstalled the Exchange 2007 tools from the domain controller at the time that these alerts activated. These alerts had not recurred since that time. The alerts were closed to monitor to see if it will reoccur.

Alert: A problem has been detected with the trust relationship between two domains.

Issue: A server in a location (site 1) lost communication with domain controllers that existed in a second location (site 2). This critical alert did NOT auto-resolve. This was detected by the alert rule “A problem has been detected with the trust relationship between the two domains.” As part of troubleshooting, verified that the Last Modified date occurred during the outage (add this column to the display by personalizing the view on the Active Alerts to include the field) and the Repeat Count was not incrementing.

Resolution: Use the Active Directory Domain Controller Server 2008 Computer Role Task of Enumerate Trusts to validate all trusts were working after site connectivity was re-established. Then log into the domain controller reporting the error and use the Active Directory Domains and Trusts UI to validate each of the trusts. Close the alert manually.

Alert: A problem has been detected with the trust relationship between two domains

Issue: This alert is occurring from domain controllers who cannot communicate with the domain controller in the trusted domain to validate this trust.

Resolution: These domain controllers do not require validation of the trust from these remote locations. Disable these alerts for the domain controllers not needing to validate the trust that were unable to reach the domain controllers that they trusted due to routing restrictions.

Alert: A problem was detected with the trust relationship between two domains

Issue: The domain controllers could not connect to the domain controller in the other domain. This was due to a routing issue between the specific domain controllers and the domain controller in the remote domain. Remote sites were connected via VPN and could not route to that subnet.

Resolution: Provided routing from the domain controllers to the domain controller in the other domain.

Alert: A problem has been detected with the trust relationship between two domains

Additional Alert: A problem with the inter-domain trusts has been detected.

Issue: This alert is occurring from domain controllers who cannot communicate with the domain controller in the trusted domain to validate this trust.

Resolution: Tested first with the NETDOM command (override with parameters to do a dsquery /domain: /verify dc) first for the local domain (success) then for the remote domain reporting the failure (failed with cannot contact the remote domain). Then nltest first for the local domain (success) then for the remote domain reporting the failure (ERROR_NO_LOGON_SERVERS). Ran a DCDIAG on the server next and a NETDIAG. Failures on the server on both NETDOM and NLTEST queries. Ran the enumerate trusts task on the system, it fails on the remote domain as well (AD_Enumerate_Trusts.vbs). DNS was inconsistent in the environment (used nslookup with different servers to validate that the results of the lookup to the remote domain name were not consistent). Made DNS consistent and flushed DNS on the server experiencing the alerts. The critical level alert resolved itself, closed the other one.

Alert: A problem has been detected with the trust relationship between two domains

Additional Alert: A problem with the inter-domain trusts has been detected

Issue: Specific domain controllers were reporting the alert as an issue to verify the trust between two forests in the environment.

Resolution: The domain controller in question did not have a zone to provide name resolution to the other forest. Added the zone to the domain controller’s DNS.

Alert: A problem has been detected with the trust relationship between two domains

Issue: This occurs when a domain controller has been removed from the environment, and does not represent an issue if the Alert Description contains the information that it cleaned up the naming context.

Resolution: Alerts of this type can be closed, as they will occur on each domain controller in the environment that sees the piece of the replication that is no longer relevant.

Alert: A problem with the inter-domain trusts has been detected.

Issue: A server in a location (site 1) lost communication with domain controllers that existed in a second location (site 2). This critical alert did NOT auto-resolve. This was detected by the AD Trust Monitoring monitor, which runs every 5 minutes using the AD Monitor Trusts script. It was verified that the Last Modified date occurred during the outage (add this column to the display by personalizing the view on the Active Alerts to include the field) and the Repeat Count was not incrementing.

Resolution: Use the Active Directory Domain Controller Server 2008 Computer Role Task of Enumerate Trusts to validate all trusts were working after site connectivity was re-established. Next, log into the domain controller reporting the error and use the Active Directory Domains and Trusts UI to validate each of the trusts. This alert should auto-resolve when the trust relationships are working, but that functionality does not appear to work. The alert was closed manually.

Alert: A replication island has been detected. Replication will not occur across the enterprise.

Issue: In sites and services dc1 replicated with dc2 but dc2 did not replicate with dc1.

Resolution: DC1 was only referencing itself for DNS as 127.0.0.1, with no DNS to the remote DC on the TCP port properties. Rebooted DC2 after the change since DNS could not connect to itself on DC2.

The root cause of this alert was an issue with RPC between the two domain controllers. RPC in the environment is coded to a specific port and this port change had not been made to the second domain controller.

Alert: Account Changes Report Available.

Issue: Informational alert, which can be accessed in the AD SAM Account Changes report (available on the right side under Active Directory Domain reports).

Resolution: No resolution required. Checked the AD SAM Account Changes report (available on the right side under Active Directory Domain reports) to see the changes that were available.

Alert: Active Directory cannot perform an authenticated RPC call to another DC because the SPN for the destination DC is not registered on the KDC

Issue: One domain controller was offline during the time period, a second domain controller was promoted, the FSMO roles were moved, and then the process was rolled back due to technical issues. Caused by replication issues in the environment. The domain controller had been dcpromoted back out and back in and resulted in old records that were within ADSIEdit and were invalid. This was part of the ForestDNSZones,DC=_msdcs.abcco.com records.

Resolution: Added SPN information manually to the server to work around the errors.

Alert: AD cannot allocate memory

Issue: The domain controller has 4GB of memory but when it was logged into there were more than 6GB of memory in use. Attempted to stop programs which appeared to be causing this (large numbers of cmds and nslookup tasks that were failed) but this did not end up freeing the memory.

Resolution: Per the product knowledge, rebooted the server and verified that the memory had returned to more reasonable numbers (less than 1GB in use). Closed the alert, but tracking this to see if it recurs on this server.

Alert: AD Client Side – Script Based Test Failed to Complete.

Issue: AD Replication Partner Op Master Consistency: The script ‘AD Replication Partner Op Master Consistency’ could not create object ‘McActiveDir.ActiveDirectory.’ This is an unexpected error. The error returned was ‘ActiveX component can’t create object’ (0x1AD)

Resolution: In MOM 2005, this was resolved by changing the Action account. In OpsMgr 2007, this alert occurred in a different domain than the one with the OpsMgr root management server (RMS). To resolve this, create a Run As Account for the domain (DMZ) and assign the Run As Account to the AD domain controllers in the DMZ domain.

Alert: AD Client Side – Script Based Test Failed to Complete.

Issue: This alert is generated by the AD Replication Partner Op Master Consistency monitor. The system reporting the error was generating an error of event id 45 in the Operations Manager Log from the source of Health Service Script.

This event is occurring on an hourly basis (12:57, 1:58, and so on):

AD Replication Partner Op Master Consistency: The script ‘AD Replication Partner Op Master Consistency’ failed to execute the following LDAP query: ‘<LDAP://servername.odyssey.com/CN=Configuration,DC=ODYSSEY,DC=COM>;(&(objectClass=crossRefContainer)(fSMORoleOwner=*));fSMORoleOwner;Subtree’.

The error returned was ‘Table does not exist.’ (0x80040E37)

This alert is linked to “Could not determine the FSMO role holder.” alerts that are occurring.

Resolution: Believe this was related to misconfigurations of the anti-virus settings on the domain controllers in the environment.

Alert: AD Domain Performance Health Degraded.

Issue: More than 60% of the DCs contained in this AD Domain report a Performance Health problem

Resolution: This alert indicates that there are alerts that are occurring in more than 60% of the domain controllers in a domain. This alert does not require an action for itself but does require analysis to determine what is causing the domain controllers to be in a degraded state.

Alert: AD Op Master is inconsistent.

Issue: Tested using the AD Replication Partner Op Master Consistency alert monitor, which runs every minute, to verify the incoming replication partners for the domain controller show the same operations masters. Also used the REPADMIN Replsum task in the Active Directory MP.

Resolution: The REPADMIN Replsum command validated that replication was functioning correctly (had to override the “Support Tools Install Dir” on Windows 2008 to %windir%\system32 to make the task work correctly). The override was done when the task was actually run. It’s not created as an override in the OpsMgr console or in the Authoring View but rather when the task is executed. The link between the domain controllers has been running close to fully saturated. The alert auto-resolved once the network utilization slowed down.

Alert: AD Op Master is inconsistent

Issue: Active Directory Operations Master role is found to be in a transitional state.

Resolution: This message is generated when an AD Operations Master role is moved from one server to another and can be safely ignored.

Alert: AD Op Master is inconsistent

Issue: Tested using the AD Replication Partner Op Master Consistency alert monitor, which runs every minute, to verify the incoming replication partners for the domain controller show the same operations masters. Also used the REPADMIN Replsum task in the Active Directory MP.

Resolution: Additional information on this alert is available at Marcus Oh’s blog at http://marcusoh.blogspot.com/2009/07/understanding-ad-op-master-is.html.

Alert: AD Replication is occurring slowly

Issue: Same as identified in alert AD Replication is slower than the configured threshold. This rule does not provide the ability to override the default configuration of 15 minutes. The AD environment is not configured with the default of 15 minutes so these rules do not apply as they are still replicating within a successful timeframe.

Resolution: Disabled this rule (AD Replication is occurring slowly) for group AD Domain Controller Group (Windows 2003 Server). You could also do this for individual servers if there were a limited number of these where the AD replication was not configured with default replication times of 15 minutes. Closed the alerts.

Alert: AD Replication is occurring slowly

Issue: Occurred on a domain controller that had been having issues replicating for a period of time.

Resolution: Rebooted the domain controller, this alert was generated after the reboot. The script is scheduled to run every 900 seconds (every 15 minutes). Used the REPADMIN Replsum command to validate that replication was functioning correctly (had to override the “Support Tools Install Dir” on Windows 2008 to %windir%\system32). No errors were found on the REPADMIN Replsum command. Waited the 15 minutes to verify the domain controller was not continuing to experience the issue, and closed the alert.

Alert: AD Replication is slower than the configured threshold

Issue: Intersite Expected Max Latency (min) default 15
Intrasite Expected Max Latency (min) default 5.

Issue: This alert will also occur if connectivity is lost between sites for a long enough period of time.

Resolution: If the alert is not current and not repeating and if replication is occurring and the Repadmin Replsum task comes up clean, this alert can be noted (to see if there is a consistent day of week or time that it occurs at) and closed. Added a diagnostic to the AD Replication Monitoring monitor, for the critical state, taking the information from the REPADMIN Replsum task which provided (You must have the admin utilities installed on the DC for this to work):

REPADMIN.EXE
%ProgramFiles%\Support Tools\ /replsum 1200

Created the diagnostic to run automatically using:
Program: REPADMIN.EXE
Working Directory: %ProgramFiles%\Support Tools
Parameters: /replsum
Options available included changing the replication topology to replicate every 15 minutes, or configuring overrides. To resolve, tried creating a custom group for the servers in the location (see the Creating Computer Groups based on AD Site in OpsMgr blog entry on http://Cameronfuller.spaces.live.com for additional information) and created an override for the new group changing the Intersite Expected Max Latency to 120 (so it would be double the configuration in AD Sites and Services). Performed this configuration for each remote location that did not have a 15 minute replication interval. You could also do thihs for all domain controllers, using the domain controller computer group(s). This did not function as expected but is used as an example for how overrides can be creatively configured, in this case based upon sites!

Alert: AD Replication Monitoring – Access denied

Issue: This occurred on one domain controller and there also was an alert stating that it failed to create the MOMLatencyMonitors container. Validated the container by logging into the domain controller, opening up AD Users and Computers, View ->Advanced Features, and verifying the container (and the two existing domain controllers as sub-containers) exists.

Resolution: Already resolved, as the MSAA had the permissions required to create this container. Validated the MOMLatencyMonitors container existed and that container included sub-folders matching the name of each domain controller. (If the container does not exist, it is often due to insufficient permissions; see configuring the replication account within the AD MP Guide for configuration information.)

Alert: AD Replication Monitoring – Access denied

Issue: This occurred on several domain controllers when the OpsMgrLatencyMonitors container was removed. Validated the container by logging into the domain controller, opening up AD Users and Computers, View -> Advanced Features, and verifying the container (and the two existing domain controllers as sub-containers) exists.

Resolution: Already resolved as the MSAA had the permissions required to create this container. Validated the OpsMgrLatencyMonitors container existed and that container included sub-folders matching the name of each domain controller. (If the container does not exist, it is often due to insufficient permissions; see configuring the replication account within the AD MP Guide for configuration information.)

Alert: AD Replication Monitoring – Time skew detected

Issue: Caused by domain controllers running on Virtual Servers that were synchronizing with the host operating system while the host operating system was not time synchronized.

Resolution: Fixed the actual time on the domain controllers and configured the Guest operating system in Virtual Server to not synchronize with the Host operating system. This was accomplished by shutting down the Guest operating system, configuring the Virtual Machine Addition Properties, under additional features uncheck Host Time Synchronization, and restarting the Guest operating system.

Alert: AD Site Availability Health Degraded

Issue: Caused by another alert that is affecting the DCs availability. Check the status of AD as a distributed application to determine what alert is affecting AD availability.

Resolution: Investigated the alert causing the DC availability issue, which in this case was the Logical Disk Free Space is Low alert.
Another example of this was a domain controller with a second power supply that was not plugged in and was alerting via the HP management pack.

Alert: AD Site Performance Health Degraded.

Issue: More than 60% of the DCs contained in this AD Site report a Performance Health problem

Resolution: This alert indicates that there are alerts that are occurring in more than 60% of the domain controllers in a site. This alert does not require an action for itself but does require analysis to determine what is causing the domain controllers to be in a degraded state.

Alert: Could not determine the FSMO role holder.

Issue: Each domain controller in the environment reported the error when trying to determine the Schema Op Master on the various domain controllers. The rule generating this was “Could not determine the FSMO role holder.”

Resolution: We used the NETDOM Query FSMO task (changing the Support Tools Install Dir to %windir%\system32) to validate the FSMO role holders on each domain controller.

Alert: Could not determine the FSMO role holder.

Additional Alert: AD Client Side – Script Based Test Failed to Complete

Additional Alert: AD Op Master is inconsistent

Issue: These three alerts are DNS related. In one situation, there was a bad DNS record on one of the top-level DNS servers. One could ping the NetBIOS name, but could not ping the FQDN (it was a DC in another domain within the forest). In the second instance, there was a bad IP address in the HOST file. Once all DNS resolution was resolved, the alerts auto cleared.

The alerts have also come in and then auto resolve on their own. This happened when someone rebooted a DC in another domain and that server was the only DC for that domain.

A good link to investigate DNS issues is http://www.windowsnetworking.com/articles_tutorials/Using-NSLOOKUP-DNS-Server-diagnosis.html.

Resolution: Resolving DNS issues in the environment.

Submitted By: CK on the Ops-Mgr.spaces.live.com website

Alert: DC has failed to synchronize its naming context with replication partners.

Issue: One of the domain controllers in the environment went to a grayed out status.

The server having the issues reported the “DC has failed to synchronize its naming context with replication partners” issue and “A problem has been detected with the trust relationship between two domains” and “AD Replication is occurring slowly” and “Script Based Test Failed to Complete” (for multiple AD related scripts).

Other domain controllers reported “Could not determine the FSMO role holder” and “AD Client Side – Script Based Test Failed to Complete.”

Events also occurred on the client system (21006 OpsMgr Connector, 20057 OpsMgr Connector, 21001 OpsMgr Connector).

Resolution: Installed the Telnet client feature to test connectivity to the management server. Telnet connectivity failed from this system but not from others. Restarted the OpsMgr Health service but it had no effect on the gray status. After rebooting the system, the status went back to non-gray.

Alert: DC has failed to synchronize its naming context with replication partners.

Issue: A server in a location (site 1) lost communication with domain controllers that existed in a second location (site 2). The rule generating this alert is “DC has failed to synchronize naming context with its replication partner.”

Resolution: The alerts occurred when connectivity was lost between the sites. These alerts had a Repeat Count of 0. Used the REPADMIN Replsum command to validate that replication was functioning correctly (had to override the “Support Tools Install Dir” on Windows 2008 to %windir%\system32 to make the task work correctly). Closed the alerts manually.

Alert: DC is both a Global Catalog and the Infrastructure Update master

Issue: The domain controller was both a Global Catalog and the Infrastructure master. This configuration is acceptable as long as all domain controllers are GCs, but this does result in additional replication traffic for the additional domain.

Resolution: Options available on this would be to override this and disable it on the server but this does not resolve the issue in most situations. For this environment with all DCs being GCs and the additional domain being a small child domain with only minor amounts of information, the recommended approach is to create the override.

If this is not the case, the preferred approach to take is to deploy an additional domain controller that is NOT a Global Catalog server and run the Infrastructure Update master on that server. This new domain controller can be deployed on either physical or virtual configurations depending upon the client requirements. Further detail on this condition is available in the Microsoft article at http://support.microsoft.com/default.aspx/kb/251095.

Alert: KCC cannot compute a replication path

Issue: KCC detected problems on multiple domain controllers

Resolution: Connectivity was lost from the central site to a remote site for a period of several hours. The remote site was down due to a power outage. Errors were logged every 15 minutes from when it was down until when the site was back online. This also occurred when a domain controller had been shut off but still existed from the perspective of Active Directory. This can also occur in environments where the site topology is set to automatically generate the site links but the network is configured so that some sites cannot see other sites. (As an example, in a configuration with a hub in Dallas and sites in Frisco and Plano, where both sites can see Dallas but cannot see each other.)

Alert: One or more domain controllers may not be replicating.

Issue: The AD MP will report replication issues across all DCs if only one was down (and thus not able to replicate its monitor objects).

Resolution: Get all domain controllers monitored by OpsMgr. Validate replication in the environment.

Alert: Overall Essential Services state

Issue: The Overall Essential Services state monitor portion of the Active Directory Domain Controller Server 2008 Computer role identified an alert. No additional knowledge was available.

Resolution: Speaking with the technician, it was determined he had performed an uninstallation of the Exchange 2007 tools from the domain controller at the time that these alerts activated. These alerts had not recurred since that time. Closed the alerts to monitor if it will reoccur.

Alert: Performance Module could not find a performance counter.

Issue: In PerfDataSource, could not resolve counter DirectoryServices, KDC AS Requests, Module will be unloaded.

Resolution: Created a Run As Account and configured the AD MP Account (Administration -> Security -> Run As Profiles) for each of the two servers in the domain that were reporting errors.

Alert: Replication is not occurring – All replication partners have failed to synchronize

Issue: The Alert Description is the key on this alert. All replication partners are now replicating successfully.

Resolution: Alert description of “AD Replication Monitoring: All replication partners are now replicating successfully” is a success condition and does not require any intervention other than closing the alert.

Alert: Script Based Test Failed to Complete.

Issue: AD Lost And Found Object Count: The script ‘AD Lost And Found Object Count’ failed to create object ‘McActiveDir.ActiveDirectory’. This is an unexpected error. The error returned was ‘ActiveX component can’t create object’ (0x1AD)

Resolution: Configured the AD MP Account (Administration -> Security -> Run As Profiles) for each of the two servers in the domain that were reporting errors.

Alert: Script Based Test Failed to Complete.

Issue: AD Database and Log: The script ‘AD Database and Log’ failed to create object ‘McActiveDir.ActiveDirectory’. The error returned was ‘ActiveX component can’t create object’ (0x1AD).

Resolution: Configured the AD MP Account (Administration -> Security -> Run As Profiles) for each of the two servers in the domain that were reporting errors.

Alert: Script Based Test Failed to Complete.

Issue: AD Database and Log: The script ‘AD Database and Log’ failed to create object ‘McActiveDir.ActiveDirectory.’ The error returned was ‘ActiveX component can’t create object’ (0x1AD)

Resolution: Installed OOMADS from OpsMgr 2007 R2 installation media. The OOMADs.msi file is included within the HelperObjects folder on the media within the appropriate version of the operating system (amd64, i386, ia64).

Alert: Script Based Test Failed to Complete

A problem has been detected with the trust relationship between two domains

Issue: The server was a domain controller that was exhibiting a variety of different errors including the following:

AD Monitor Trusts: The trusts between this domain (ABC.COM) and the following domain(s) are in an error state: xyz.com (inbound), the error is ‘There are currently no logon servers available to service the logon request’ (0x51F)

AD Replication Partner Count: The script ‘AD Replication Partner Count’ failed to bind to ‘LDAP://DC01.ABC.COM/CN=DC01,CN=Servers,CN=Plano,CN=Sites,CN=Configuration,DC=ABC,DC=COM.’ The error returned was ‘Object variable not set’ (0x5B)

1153 of these in 4 days + 1 hour (1:17:28 pm) – failing every 5 minutes.

AD Lost And Found Object Count: Script ‘AD Lost And Found Object Count’ was unable to bind to the lost and found container.

1152 of these in 4 days + 1 hour (1:17:28 pm) – every 5 minutes failing.

AD Database and Log: The script ‘AD Database and Log’ encountered an error while trying to get the object ‘LDAP://DC01.ABC.COM/RootDSE.’ The error returned was: ‘The server is not operational.’ (0x8007203A)

388 of these in 4 days + 2 hours (1:17:44 pm).

AD Replication Monitoring: encountered a runtime error. Failed to bind to ‘LDAP://DC01.ABC.COM/RootDSE.’ The error returned was ‘The server is not operational.’ (0x8007203A)

799 of these in 4 days + 2 hours (1:17:44 pm).

Resolution: Logged into the server, attempted to open Active Directory Domains and Trusts and received the message: “The configuration information describing this enterprise is not available. The server is not operational.” Debugging, rebooting the server. After reboot the issue opening Active Directory Domains and Trusts no longer occurred. Closed the alerts generated to see if they would recur.

Alert: Script Based Test Failed to Complete

Issue: AD Database and Log: The script AD Database and Log failed to create object McActiveDir.ActiveDirectory. The error returned was: ActiveX component cannot create object (0×1AD)

Resolution: Uninstalled OOMADS using Add/Remove programs, Active Directory Management Pack Helper Object (the original version was .05MB in size) and re-installed the 64 bit equivalent which was AMD64 in this case. To do this had to copy the MSI locally to the system to install it, after installation it was .07MB in size within Add/Remove programs.

Alert: Session setup failed because no trust account exists: Script ‘AD Validate Server Trust Event’

Issue: Specific computer accounts were identified multiple times as not containing a trust account

Resolution: This is caused either by systems that believe that they are part of the domain but no longer are, or often by systems that are being imaged. Resolution of this is either to drop and rejoin the system to the domain or to close the alert if the system is no longer online. These alerts are not actionable. Decreased the severity of these alerts from critical to informational via an override.

Alert: Some replication partners have failed to synchronize

Issue: A domain controller was offline and unable to be synchronized with.

Resolution: Bring the domain controller back online.

Alert: The AD Last Bind latency is above the configured threshold.

Issue: One domain controller had consistently high AD Last Bind Latency. Logon to the system showed it as extremely unresponsive.

Used the suggested tasks from product knowledge to validate the bind was not going slowly and no high CPU processes were identified on the system. The view available in product knowledge pointed to a large spike in the time required for the LDAP query (checking the Active Directory Last Bind counter). The spike occurred while there was a very heavy processor utilization occurring on one of the domain controllers. This monitor checks every 5 minutes. Alert auto-resolved itself after the LDAP query was responding in an acceptable timeframe.

Resolution: Attempts to debug the issue were inconclusive and extremely difficult due to the performance issue with the system. Rebooted the domain controller, it came back online, and the AD Last Bind Latency returned to normal values.

Alert: The AD Machine Account Authentication Failures Report has data available.

Issue: The alert was raised on both domain controllers in the same physical location. The alert description contains the name of the computer account that is failing to authenticate. Multiple examples of this alert have been seen where sometimes it is an actionable alert and sometimes it is not.

In one case, there was a server where the computer account had been removed from the domain. This was a fully actionable situation where the computer had to be re-added to the domain to resolve the issue. Then the alert was closed because this alert is generated by a rule so it will not auto-resolve.

In another situation, the computer account was for a workstation that was consistently not able to communicate with the domain controllers as it was connected remotely to another network via VPN.

Resolution: Disjoin from the domain (no to reboot), rejoin the domain and reboot the system which is having the issue. These alerts are not actionable. Decreased the severity of these alerts from critical to informational via an override.

Alert: The Domain Changes report has data available.

Issue: No issue, this was an informational message. This was generated when the PDC emulator role was moved between domain controllers in the environment.

Resolution: No actions required, this message is provided for situations where the PDC emulator role was moved unexpectedly.

Alert: The Domain Controller has been started

Issue: Notification that a domain controller was started, sent as an information message which is generated by an Alert Rule (since it is a rule not a monitor it will not auto-resolve). This is a good alert to keep as it provides a simple way to see when a domain controller is rebooted and when it back online. Prior to this message there should be an information message appears when the domain controller has been stopped.

Resolution: Manually close the alert as the domain controller reboot was expected.

Alert: The Domain Controller has been stopped

Issue: Notification that a domain controller was stopped, sent as an information message which is generated by an Alert Rule (since it is a rule not a monitor it will not auto-resolve). This is a good alert to keep as it provides a simple way to see what domain controllers have been rebooted to identify situations where domain controllers are unexpectedly rebooted. A follow-up information message appears when the domain controller has been restarted successfully.

Resolution: Manually closed the alert as the domain controller reboot was expected.

Alert: The Op Master Domain Naming Master Last Bind latency is above the configured threshold.

Issue: A large number of alerts are generated at > 5 seconds for warning and > 15 seconds for error.

Resolution: Per http://technet.microsoft.com/en-us/library/cc749936.aspx, the effective thresholds should be changed to warning at > 15 seconds and error at > 30 seconds. Created an override for all types of Active Directory Domain Controller Server 2008 Computer role to change Threshold Error Sec to 30 and Threshold Warning (sec) to 15 and stored it in the ActiveDirectory2008_Overrides management pack.

Alert: The Op Master PDC Last Bind latency is above the configured threshold

Issue: Bind from the domain controller identified in the alert to the PDC emulator is slower than 5 seconds for a warning and slower than 15 seconds for an error. This occurred in a remote site connecting to a central site with the PDC emulator role.

Resolution: The alert appears to be due to a slowness in the link between the two locations, or a condition where one of the two servers identified may have been overloaded. In this particular case it was caused by a domain controller that was overloaded due to insufficient hardware, which had to be decommissioned.

Alert: The logical drive holding the AD Database is low on free space.

Issue: Low disk space on the drive with the Active Directory database.

Resolution: The domain controller was a Windows 2008 virtual, which had a 20GB C drive assigned to it. This drive was increased to 30GB.

Alert: The logical drive holding the AD Logfile is low on free space

Issue: Low disk space on the drive with the Active Directory logfiles.

Resolution: The domain controller was a Windows 2008 virtual, which had a 20GB C drive assigned to it. This drive was increased to 30GB.

Alert: The Op Master Domain Naming Master Last Bind latency is above the configured threshold.

Issue: A large number of alerts are generated at > 5 seconds for warning and > 15 seconds for error.

Resolution: Per http://technet.microsoft.com/en-us/library/cc749936.aspx, the effective thresholds should be changed to warning at > 15 seconds and error at > 30 seconds. Create an override for all types of Active Directory Domain Controller Server 2008 Computer role to change Threshold Error Sec to 30 and Threshold Warning (sec) to 15 and store it in the ActiveDirectory2008_Overrides management pack.

Alert: The Op Master Schema Master Last Bind latency is above the configured threshold.

Issue: A large number of alerts are generated at > 5 seconds for warning and > 15 seconds for error.

Resolution: Per http://technet.microsoft.com/en-us/library/cc749936.aspx, change the effective thresholds to warning at > 15 seconds and error at > 30 seconds. To resolve this alert, create an override for all types of Active Directory Domain Controller Server 2008 Computer role to change Threshold Error Sec to 30 and Threshold Warning (sec) to 15 and store it in the ActiveDirectory2008_Overrides management pack.

Alert: This domain controller has been promoted to PDC.

Issue: No issue, this was an informational message. The message was generated when the PDC emulator role was moved between domain controllers.

Resolution: No actions required, this message is provided for situations where the PDC emulator role was moved unexpectedly.

During testing, there was a period of time where network connectivity was lost to a site that had one of the domain controllers. The result was a flurry of alerts listed below:

Critical Alerts:

A problem with the inter-domain trusts has been detected

DNS 2008 Server External Addresses Resolution Alert

OleDB: Results Error

Warnings:

A problem has been detected with the trust relationship between two domains

AD Client Side – Script Based Test Failed to Complete (multiple)

Could not determine the FSMO role holder. (multiple)

DC has failed to synchronize its naming context with replication partners (multiple)

Issue: Loss of network connectivity between one site and another, both of which had domain controllers.

Resolution: Once network connectivity was re-established, all issues resolved above were resolved.

Active Directory Management Pack Evolution

Three items would appear to be logical to enhance in future versions of the Active Directory management pack. These are:

Alert: A problem with the inter-domain trusts has been detected

Does not auto-resolve when the issue is resolved. A warning event id of 83 from the source of “Health Service Script” creates the critical situation, but no alerts appear which indicate that a successful trust test was accomplished so this alert always stays in a critical state.

Alert: AD Op Master is inconsistent

This alert is too sensitive. If it recurs two or three times it is relevant, or it should be tested every 5 minutes instead of every 1 minute.

The Repadmin, Repadmin Replsum, and Repadmin Snap-shot should have the correct default path for Windows Server 2008 systems

The path should be %windir%\system32.

Advertisements
This entry was posted in Tuning and Configuration. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s