Disaster Recovery in System Center 2012 Operations Manager

We had the opportunity to test the new SC 2012 OpsMgr feature of RMS promotion in a real scenario this week. While experimenting with VMM and Microsoft iSCSI Target, the virtual hard drives of three VMs in the System Center 2012 OpsMgr lab were lost (oops, but it’s a lab folks!). The VMs that were lost included the RMS emulator, a gateway, and one node of the SQL cluster. We were able to recover full function of the management group fairly quickly, and wanted to share the step-by-step recovery experience here.

RMS Emulator

  1. The management server running the RMS emulator role was lost. (Since no agents report directly to the RMS, there was no loss of monitoring availability.)
  2. Another management server was promoted to the RMS emulator role using PowerShell. Figure 1 shows the RMS emulator role being moved from server ‘helios.odyssey.com’ to ‘hannibal.odyssey.com’.
  3. A new VM with the same name and IP address as the original RMS emulator was spun up.
  4. The failed management server name was deleted from the Management Servers list in the OpsMgr console Administration space. (You can’t join the rebuilt management server to the management group without performing this step.)
  5. The OpsMgr management server role was installed on the rebuilt VM, following the procedure to install an additional management server to the management group.
  6. The original RMS emulator was promoted back into that role using PowerShell.

image

Figure 1 – OpsMgr PowerShell: Lost RMS emulator (Helios), promoting surviving management server to RMS role (Hannibal), and confirming.

Gateway

  1. A gateway server monitoring Internet-based computers was lost. (Downstream agents and gateways with failover gateway assignments will resume reporting to the failover gateway(s).)
  2. A new gateway with the same name and IP address as the original was spun up. (The failed gateway is not deleted in the OpsMgr console.)
  3. OpsMgr gateway server role was installed on the rebuilt VM, pointing the gateway to the same management server the original gateway was assigned to.
  4. Ran MOMCertImport.exe on the gateway to import a trusted OpsMgr management certificate.
  5. All downstream gateways and agents started to report into the rebuilt gateway without further action.

SQL Server Cluster Node

  1. A node in a SQL cluster hosting an OpsMgr database was lost (the clustered SQL instance automatically failed over to the surviving cluster node).
  2. A new SQL cluster node with the same name and IP address as the original was spun up.
  3. The lost cluster node was evicted from the cluster.
  4. The new VM was configured with the same storage and networking as the original cluster node.
  5. The Failover Cluster feature was installed and the node joined to the cluster.
  6. SQL Server install was completed for each clustered SQL instance by using the installation option to add a node to a SQL server failover cluster.
  7. A replacement OpsMgr agent was pushed to the rebuilt database cluster node by running the repair agent task in the OpsMgr console Administration space, agent-managed computers node.
Advertisements
This entry was posted in Operations Manager 2012. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s