The case for database clustering

We were asked not too long ago what would be the top reasons to recommend clustering on the OpsMgr database servers versus other options for high availability. Since "doing clustering" is something that is often just assumed to be THE only high availability approach, the topic seemed worthy of a blog article.

Clustering is a fault-tolerant, high-performance, and scalable approach for database availability. It provides automatic failover and does not degrade overall SQL Server performance.

Other possible availability options might be log shipping, database mirroring, and manual recovery.

The advantage of clustering over these is you set it up and you don’t have to do anything manual. Manual recovery methods are of course … manual. In addition, if you don’t maintain the transaction log, manual recovery and log shipping don’t do you much good. By default the OpsMgr Operations database does not support forward recovery using the transaction log, although you can change this option which would allow you to implement log shipping. Due to the high processing requirements of the ACS database, we do not recommend log shipping this database.

Then there is database mirroring, which is new starting with SQL Server 2005 Service Pack 1. It, like log shipping, uses a standby server, so there is no savings in hardware cost with these options versus using clustering. Both these methods also add performance overhead to maintain the transaction information on a standby server. Microsoft does not support SQL 2005 database mirroring functionality for any of the OpsMgr databases.

However, some organizations are hesitant to use clustering if they are not already familiar with the technology. Clustering does present increased complexity when supporting the database server in areas such as patch management. It also doesn’t help in cases of data corruption. Probably the most fail-safe approach for high availability would include clustering and using RAID 5 (or 6).

What’s the best approach?

On a per server basis, you can minimize server failure and data corruption by using redundant drive storage (RAID 5/6) and server class hardware – with active support agreements! Your goal is to make it as redundant as is viable, and supportable as well. Clustering will give you failover capabilities if the server itself dies.

It really comes down to your business requirements. Generally for environments where high availability is required, we recommend clustering, which is best for performance and is a more proven technology than mirroring or log shipping.

This entry was posted in Operations Manager 2007. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s