OpsMgr by Example: The Active Directory 2008 Management Pack

This blog entry is the next in a series of Operations Manager-related items that review the steps performed to install, configure, and tune management packs in real-world environments:

Installation

1) Download the Active Directory Management Pack (http://www.microsoft.com/downloads/details.aspx?FamilyId=008F58A6-DC67-4E59-95C6-D7C7C34A1447&displaylang=en). The Active Directory Management Pack Guide is included in the download and labeled “OM2007_MP_AD2008.doc.”

2) Read the Management Pack guide – cover to cover. This document spells out in detail some important pieces of information you will need to know.

3) Import the AD Management Pack (using either the Operations console or PowerShell).

4) Deploy the OpsMgr agent to all domain controllers (DCs). The agent must be deployed to all DCs. Agentless configurations will NOT work for the AD Management Pack.

5) Get a list of all domain controllers from the Operations console. In the Authoring space, navigate to Authoring -> Groups -> AD Domain Controller Group (Windows 2008 Server). Right-click on the group(s) and select View Group Members.

6) Enable Agent Proxy configuration on all Domain Controllers identified from the groups. This is in the Administration space, under Administration -> Device Management -> Agent Managed. Right-click each domain controller, select Properties, click the Security tab, and then check the box labeled “Allow this agent to act as a proxy and discover managed objects on other computers.” Perform this action for every domain controller, even if the DC is added after your initial configuration of OpsMgr.

7) Configure the Replication account in the Operations console, under Administration -> Security (full details for this are in the AD MP Guide). Do this for every domain controller, even if a DC is added after your initial OpsMgr configuration.

8) Validate the existence of the “OpsMgrLatencyMonitors” container. Within this container, create sub-folders for each DC, using the name of each domain controller. If the container does not exist, it is often due to insufficient permissions. (See information configuring the Replication account within the AD MP Guide for details.)

9) Open the Operations console. Go to the Monitoring node and navigate to Monitoring -> Microsoft Windows Active Directory -> Topology Views and validate functionality. (You may have to set the scope to the AD Domain Controllers Group to get these views to populate).

10) Check to make sure Active Directory shows up under Monitoring -> Distributed Applications as a distributed application that is in the Healthy, Warning or Critical state. If it is in the “Not Monitored” state, check for domain controllers that are not installed or are in a “gray” state.

11) Create a MicrosoftWindowsActiveDirectory_Overrides management pack to contain any overrides required for the MP (hey, if it’s not created now we’ll never remember to create it and we’ll end up using the default MP and that’s not good – see http://cameronfuller.spaces.live.com/blog/cns!A231E4EB0417CB76!1152.entry or System Center Configuration Manager 2007 Unleashed for details there).

Deploying the Active Directory 2008 Management Pack was relatively painless. After importing the management pack, there was no significant impact on processors seen on the domain controllers. The Active Directory Topology Root appeared as a distributed application and showed a health state of green. The Active Directory diagram view also worked as expected.

Tuning/Alerts to Look For

We encountered and resolved the following alerts while tuning the Active Directory management pack.

Alert: The AD Last Bind latency is above the configured threshold.

Issue: One domain controller had consistently high AD Last Bind Latency. Logon to the system showed it as extremely unresponsive.

From product knowledge, we used the suggested tasks to validate that the bind was not going slowly and no high CPU processes were identified on the system. The view available in product knowledge pointed to a large spike in the time required for the LDAP query (checking the Active Directory Last Bind counter). The spike occurred while there was a very heavy processor utilization occurring on one of the domain controllers. This monitor checks every 5 minutes. Alert auto-resolved itself after the LDAP query was responding in an acceptable timeframe.
Resolution: Attempts to debug the issue were inconclusive and extremely difficult due to the performance issue with the system. We rebooted the domain controller, it came back online, and the AD Last Bind Latency returned to normal values.

Alert: A problem has been detected with the trust relationship between two domains.

Issue: A server in a location (site 1) lost communication with domain controllers that existed in a second location (site 2). This critical alert did NOT auto-resolve. This was detected by the alert rule “A problem has been detected with the trust relationship between the two domains.” We verified that the Last Modified date occurred during the outage (add this column to the display by personalizing the view on the Active Alerts to include the field) and the Repeat Count was not incrementing.

Resolution: We used the Active Directory Domain Controller Server 2008 Computer Role Task of Enumerate Trusts to validate all trusts were working after site connectivity was re-established. We then logged into the domain controller reporting the error and used the Active Directory Domains and Trusts UI to validate each of the trusts. We closed the alert manually.

Alert: A problem with the inter-domain trusts has been detected.

Issue: A server in a location (site 1) lost communication with domain controllers that existed in a second location (site 2). This critical alert did NOT auto-resolve. This was detected by the AD Trust Monitoring monitor which runs every 5 minutes using the AD Monitor Trusts script. We verified that the Last Modified date occurred during the outage (add this column to the display by personalizing the view on the Active Alerts to include the field) and the Repeat Count was not incrementing.

Resolution: We used the Active Directory Domain Controller Server 2008 Computer Role Task of Enumerate Trusts to validate all trusts were working after site connectivity was re-established. We next logged into the domain controller reporting the error and used the Active Directory Domains and Trusts UI to validate each of the trusts. This alert should auto-resolve when the trust relationships are working, but that functionality does not appear to work. We manually closed the alert.

Alert: AD Op Master is inconsistent.

Issue: We tested using the Alert Monitor “Ad Replication Partner Op Master Consistency,” which runs every minute, to verify the incoming replication partners for the domain controller show the same operations masters. We also used the REPADMIN Replsum task in the Active Directory MP.

Resolution: The REPADMIN Replsum command validated that replication was functioning correctly (we had to override the “Support Tools Install Dir” on Windows 2008 to %windir%\system32 to make the task work correctly). The link between the domain controllers has been running close to fully saturated. The alert auto-resolved once the network utilization slowed down.

Alert: AD Client Side – Script Based Test Failed to Complete.

Issue: This alert is generated by the “AD Replication Partner Op Master Consistency” monitor. The system reporting the error was generating an error of event id 45 in the Operations Manager Log from the source of Health Service Script.

This event is occurring on an hourly basis (12:57, 1:58, and so on):

AD Replication Partner Op Master Consistency : The script ‘AD Replication Partner Op Master Consistency’ failed to execute the following LDAP query: ‘<LDAP://servername.contoso.com/CN=Configuration,DC=CONTOSO,DC=COM>;(&(objectClass=crossRefContainer)(fSMORoleOwner=*));fSMORoleOwner;Subtree’.

The error returned was ‘Table does not exist.’ (0x80040E37)

This alert is linked to “Could not determine the FSMO role holder.” alerts that are occurring.

Resolution: We believe this was related to a misconfiguration of the anti-virus settings on the domain controllers in the environment.

Alert: DC has failed to synchronize its naming context with replication partners.

Issue: A server in a location (site 1) lost communication with domain controllers that existed in a second location (site 2). The rule generating this alert is “DC has failed to synchronize naming context with its replication partner”.

Resolution: The alerts occurred when connectivity was lost between the sites. These alerts had a Repeat Count of 0. We used the REPADMIN Replsum command to validate that replication was functioning correctly (had to override the “Support Tools Install Dir” on Windows 2008 to %windir%\system32 to make the task work correctly). We closed the alerts manually.

Alert: Could not determine the FSMO role holder.

Issue: Each domain controller in the environment reported the error when trying to determine the Schema Op Master on the various domain controllers. The rule generating this was “Could not determine the FSMO role holder”.

Resolution: We used the NETDOM Query FSMO task (changing the Support Tools Install Dir to %windir%\system32) to validate the FSMO role holders on each domain controller.

Alert: DC has failed to synchronize its naming context with replication partners.

Issue: One of the domain controllers in the environment went to a grayed out status.

The server having the issues reported the “DC has failed to synchronize its naming context with replication partners” issue and “A problem has been detected with the trust relationship between two domains” and “AD Replication is occurring slowly” and “Script Based Test Failed to Complete” (for multiple AD related scripts).

Other domain controllers reported “Could not determine the FSMO role holder” and “AD Client Side – Script Based Test Failed to Complete”.

Events also occurred on the client system (21006 OpsMgr Connector, 20057 OpsMgr Connector, 21001 OpsMgr Connector).

Resolution: We installed the Telnet client feature to test connectivity to the management server. Telnet connectivity failed from this system but not from others. We then restarted the OpsMgr Health service but it had no effect on the gray status. After rebooting the system, the status went back to non-gray.

Alert: AD Client Side – Script Based Test Failed to Complete.

Issue: AD Replication Partner Op Master Consistency: The script ‘AD Replication Partner Op Master Consistency’ could not create object ‘McActiveDir.ActiveDirectory’. This is an unexpected error. The error returned was ‘ActiveX component can’t create object’ (0x1AD)

Resolution: In MOM 2005, this was resolved by changing the Action account. In OpsMgr 2007, this alert occurred in a different domain than the one with the OpsMgr RMS server. To resolve this, we created a Run As Account for the domain (DMZ) and assigned the Run As Account to the AD domain controllers in the DMZ domain.

Alert: Script Based Test Failed to Complete.

Issue: AD Lost And Found Object Count: The script ‘AD Lost And Found Object Count’ failed to create object ‘McActiveDir.ActiveDirectory’. This is an unexpected error. The error returned was ‘ActiveX component can’t create object’ (0x1AD)

Resolution: We configured the AD MP Account (Administration / Security / Run As Profiles) for each of the two servers in the domain that were reporting errors.

Alert: Script Based Test Failed to Complete.

Issue: AD Database and Log : The script ‘AD Database and Log’ failed to create object ‘McActiveDir.ActiveDirectory’. The error returned was ‘ActiveX component can’t create object’ (0x1AD).

Resolution: We configured the AD MP Account (Administration -> Security -> Run As Profiles) for each of the two servers in the domain that were reporting errors.

Alert: Performance Module could not find a performance counter.

Issue: In PerfDataSource, could not resolve counter DirectoryServices, KDC AS Requests, Module will be unloaded.

Resolution: We created a Run As Account and configured the AD MP Account (Administration -> Security -> Run As Profiles) for each of the two servers in the domain that were reporting errors.

Alert: Script Based Test Failed to Complete.

Issue: AD Database and Log : The script ‘AD Database and Log’ failed to create object ‘McActiveDir.ActiveDirectory’. The error returned was ‘ActiveX component can’t create object’ (0x1AD)

Resolution: We installed OOMADS from the OpsMgr 2007 SP 1 CD.

Alert: This domain controller has been promoted to PDC.

Issue: No issue, this was an informational message. The message was generated when the PDC emulator role was moved between domain controllers.

Resolution: No actions required, this message is provided for situations where the PDC emulator role was moved unexpectedly.

Alert: The Domain Changes report has data available.

Issue: No issue, this was an informational message. This was generated when the PDC emulator role was moved between domain controllers in the environment.

Resolution: No actions required, this message is provided for situations where the PDC emulator role was moved unexpectedly.

Alert: AD Domain Performance Health Degraded.

Issue: More than 60% of the DCs contained in this AD Domain report a Performance Health problem

Resolution: This alert indicates that there are alerts that are occurring in more than 60% of the domain controllers in a domain. This alert does not require an action for itself but does require analysis to determine what is causing the domain controllers to be in a degraded state.

Alert: AD Site Performance Health Degraded.

Issue: More than 60% of the DCs contained in this AD Site report a Performance Health problem

Resolution: This alert indicates that there are alerts that are occurring in more than 60% of the domain controllers in a site. This alert does not require an action for itself but does require analysis to determine what is causing the domain controllers to be in a degraded state.

Alert: Account Changes Report Available.

Issue: Informational alert, which can be accessed in the AD SAM Account Changes report (available on the right side under Active Directory Domain reports).

Resolution: No resolution required. We checked the AD SAM Account Changes report (available on the right-side under Active Directory Domain reports) to see the changes that were available.

During our testing, we had a period of time when we lost network connectivity to a site that had one of the domain controllers. The result was a flurry of alerts listed below:

Alerts:

Critical Alerts:

  • A problem with the inter-domain trusts has been detected
  • DNS 2008 Server External Addresses Resolution Alert
  • OleDB: Results Error

Warnings:

  • A problem has been detected with the trust relationship between two domains
  • AD Client Side – Script Based Test Failed to Complete (multiple)
  • Could not determine the FSMO role holder. (multiple)
  • DC has failed to synchronize its naming context with replication partners (multiple)

Issue: Loss of network connectivity between one site and another, both of which had domain controllers.

Resolution: Once network connectivity was re-established, we resolved all issues identified above.

 

UPDATE: 02/25/09

Alert:  The Op Master Schema Master Last Bind latency is above the configured threshold.

 

Issue: A large number of alerts are generated at > 5 seconds for warning and > 15 seconds for error.

 

Resolution: Per http://technet.microsoft.com/en-us/library/cc749936.aspx the effective thresholds should be changed to warning at > 15 seconds and error at > 30 seconds. Created an override for all types of Active Directory Domain Controller Server 2008 Computer role to change Threshold Error Sec to 30 and Threshold Warning (sec) to 15 and stored it in the ActiveDirectory2008_Overrides management pack.

 

Alert:  The Op Master Domain Naming Master Last Bind latency is above the configured threshold.

Issue: A large number of alerts are generated at > 5 seconds for warning and > 15 seconds for error.

 

Resolution: Per http://technet.microsoft.com/en-us/library/cc749936.aspx the effective thresholds should be changed to warning at > 15 seconds and error at > 30 seconds. Created an override for all types of Active Directory Domain Controller Server 2008 Computer role to change Threshold Error Sec to 30 and Threshold Warning (sec) to 15 and stored it in the ActiveDirectory2008_Overrides management pack.

About these ads
This entry was posted in Tuning and Configuration. Bookmark the permalink.

13 Responses to OpsMgr by Example: The Active Directory 2008 Management Pack

  1. Unknown says:

    I have been struggling with some of these alerts since the new AD MP install and have found the following resolution for our situation. All the following alerts came in during the same time period:_________________________________________________________________Alert: Could not determine the FSMO role holder.Source: <SERVER>Path: <SERVER FQDN>Last modified by: SystemLast modified time: 1/7/2009 12:01:24 AM Alert description: AD Replication Partner Op Master Consistency : Unable to determine schema Op Master on domain controller \'<TARGET SERVER NETBIOS NAME>\’._________________________________________________________________Alert: AD Client Side – Script Based Test Failed to CompleteSource: <SERVER>Path: <SERVER FQDN>Last modified by: SystemLast modified time: 1/7/2009 12:01:24 AM Alert description: AD Replication Partner Op Master Consistency : The script \’AD Replication Partner Op Master Consistency\’ failed to executethe following LDAP query: \'<LDAP://<TARGET SERVER FQDN NAME>/CN=Schema,CN=Configuration,DC=xx,DC=xx,DC=net>;(&(objectClass=dMD)(fSMORoleOwner=*));fSMORoleOwner;Subtree\’. The error returned was \’Table does not exist.\’ (0x80040E37)_________________________________________________________________Alert: AD Op Master is inconsistentSource: <SERVER>Path: <SERVER FQDN>Last modified by: SystemLast modified time: 1/7/2009 12:01:21 AM Alert description: The Domain Controller\’s Op Master is inconsitent. See additional alerts for details._________________________________________________________________All these alerts are DNS related. In one situation, we had a bad DNS record on one of our top-level DNS servers. We could ping the netbios name, but could not ping the FQDN (it was a DC in another domain within our forest). In the second instance, there was a bad IP address in the HOST file. Once all DNS resolution was resolved, the alerts auto cleared.We also have the alerts come in and then auto resolve on their own. This happened when someone rebooted a DC in another domain and that server was the only DC for their domain.I hope that helps. Here is a good link to investigate DNS issues:http://www.windowsnetworking.com/articles_tutorials/Using-NSLOOKUP-DNS-Server-diagnosis.html-CK

  2. Mark says:

    I just imported the latest version of the AD management pack(s). And unfortunately the infamous \’AD Processor Overload (lsass) Monitor\’ creates alerts, could it be that the script still isn\’t updated (http://blopon.blogspot.com/2008/04/lsass-monitor-in-active-directory-mp.html)? The solution does not work anymore, because the properties in the new version are changed. Hope you can help out!Regards,Mark

  3. Unknown says:

    You "had to override the “Support Tools Install Dir” on Windows 2008 to %windir%\\system32 to make the task work correctly"Sounds correct, but *exactly how* did you accomplish to set this override? Could not find this in Authoring View nor in the auth console. Could you post some xml snippet? Thanks

  4. Operations says:

    Regarding the "Support Tools Install Dir" question:The override was done when the task was actually run. It’s not created as an override in the OpsMgr console or in the Authoring View but rather when the task is executed. Hope this helps.

    • Ernie says:

      Hello, thanks for posting this information most useful. I have a few questions though please.

      I am having most of the issues mentioned here as well (Windows 2003 DC’s single domain)

      From what I have read, some of the AD related scripts that run on the agent utilize files from the “Support Tools” i.e. RepAdmin for example.

      I thought installation of the Support Tools on a server was optional i.e. not installed with the base OS installation. If that is the case, seems odd a MP would relie on files not installed on a DC by default?

      I can install Support Tools on each DC, but the default installation path is “C:\Program Files\Support Tools” reading your post it suggests they should be installed into %windir%\\system32
      Therefore do I need to go around all my Windows 2003 DC and install support tools again but into the Windows\system32 directory?

      I also not your comment about incorrectly configured anti-virus on the domain controllers. We use Symantec End Point protection. We have a global rule to not scan “c:\program files\system center operations manager 2007\health service state” I guess the exclusion should also apply to Domain Controllers, not sure was it Symantec you had the issue with?

      Thanks I would be grateful if you could kindly email the answers to the above
      ErnestBrant@Hotmail.co.uk

  5. Operations says:

    Responding to Mark Verbaas:We haven\’t had the lsass overload issue, which is why its not included the by Example write-up on this version of the AD MP. In our environment, there are no overrides in place for this particular monitor. The LSASS threshold percentage in the AD MP is 80% which Pontus says should actually be 15% (http://blopon.blogspot.com/2008/04/lsass-monitor-in-active-directory-mp.html). This should be able to be overridden to 15% as discussed in the article. There is no value for “min number of min between alerts” in the new version of the MP so that change does not appear to apply any longer.

  6. Bill says:

    I\’m also gettng the "lsass process high processor load detected" alerts. The AD Processor Overload (lsass) Monitor has these parameters: Number of Samples – 3 Interval (sec) – 10 Lsass Threshold (%) – 80 I understand this to mean if the lsass process is using 80% or more of the CPU over 3 consecutive samplings (each 10 seconds apart) then an alert will be triggered. Is this correct? If so why would you want to change the Threshold to 15%? That would seem to increase the number of alerts that are generated.Thanks.

  7. Operations says:

    Our environment uses the default configurations and we have not created to the override to set the threshold to 15%. We agree with your logic that setting the override to 15% may generate a larger number of alerts than should be required.

  8. Bill says:

    Thanks Operations Manager for your comment about the lsass process. On another forum I was told that the threshold doesn\’t take into account multiple CPUs so you need to multiple the threshold value by the number of CPUs in your DC. Also your book has been a great help in getting me up to speed on SCOM.

  9. Jonathan says:

    Nice post! I have found simular alerts and i can confirm the most of your solutions given. The only thing we are strugling with are the following alerts:DC has failed to synchronize its naming context with replication partners” issue and “A problem has been detected with the trust relationship between two domains” and “AD Replication is occurring slowly” and “Script Based Test Failed to Complete” (for multiple AD related scripts). You mentioned that a misconfigured virus scanner is the fault of that. Could you clarify this a little bit more?It sounds like the Active directory database must be excluded from scanning. Im not sure atm if thats the case but i can check ofcourse.Thanks in advance!

  10. Shady says:

    hey im facing a problem that when i import the ADMP the CPU uitilization is 100% on the opsmgr 2007 R2 all the time anyway to overcome this isse pleasethanks

  11. Operations says:

    You don\’t say what version of the ADMP you are running. Check http://blogs.technet.com/kevinholman/archive/2009/11/04/updated-active-directory-admp-management-pack-released-version-6-0-7065-0.aspx to see if this addresses your problem.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s