Why Did Your Availability Group Creation Fail?

Availability Groups are a fantastic way to provide high availability and disaster recovery for your databases, but it isn’t exactly the easiest thing in the world to pull off correctly. To do it right there’s a lot of planning and effort that goes into your Availability Group topology. The funny thing about AGs is as hard as they are to plan…they’re pretty easy to implement…but sometimes things can go wrong. In this post I’m going to show you how to look into things when creating your AGs fails.

When working at a customer site today I encountered and error that I haven’t seen before when creating an Availability Group. So I’m going to walk you through what happened and how I fixed it. So if your AGs fail at creation, you can follow this process to dig into why.

First, let’s try to create our Availability Group

But, that fails and we get this error…it tells me what happened and to go look in the SQL Server error log for more details.

OK, so let’s look in the SQL Server error Log and see what we find.

Clearly something is up, the AG tried to come online but couldn’t.

The error here say check out the Windows Server Failover Clustering log…so let’s go ahead and do that. But that’s not as straightforward as you think. WSFC does write to the event log, but the errors are pretty generic for this issue. Here’s what you’ll see in the System Event Log and the Cluster Events section in the Failover Cluster Manager

Wow, that’s informative, right? Luckily we still have more information to look into.

Let’s dig deeper with using the WSFC cluster logs

The cluster logs need to be queried, they’re not readily available as text for us. We can write them out to file with this PowerShell cmdlet Get-ClusterLogs. Let’s make a directory and dump the logs into there.

Now we have some data to look through!

When we look at the contents of the cluster logs files generates by Get-ClusterLogs, we’re totally on the other side of the spectrum when it comes to information verbosity. The logs so far have been pretty terse and haven’t really told us about what’s causing the failure…well dig through this log and you’ll likely find your reason and a lot more information. Good stuff to look at to get an understanding of the internals of WSFCs. Now for the the reason my Availability Group creation failed was permissions. Check out the log entries.

Well that’s pretty clear about what’s going on…the process creating the AG couldn’t connect to SQL Server to run the very important sp_server_diagnostics stored procedure. A quick internet search to find a fix yielded this article from Mike Fal (b | t) which points to this Microsoft article detailing the issue and fix.

For those that don’t want to click the links here’s the code to adjust the permissions and allow your Availability Group to create.

So to review…here’s how we found our issue.

  1. Read the error the create script gives you
  2. Read the SQL Server error log
  3. Look at your System Event log
  4. Dump your Cluster Logs and review

Use this technique if you find yourself in a situation where your AG won’t come online or worse…fails over unexpectedly or won’t come back online.

Leave a Reply

Your email address will not be published. Required fields are marked *