Tag Archives: SQL

Instant File Initialization in SQL Server on Linux

Earlier this week Ned Otter (@NedOtter) brought up a question about Instant File Initialization on SQL Server on Linux, check out the thread here. I was up way too early in the morning, as I normally am, so I decided to poke around and see how it was done. SQL Server pros, here you can see you can get some deep internal information from the OS very easily. Hopefully with this blog post you’ll be able to compare how this is done on Windows and draw the connections between the two platforms.

Let’s check it out…

SQL on Linux Internals

First, the internals of SQL Server on Linux leverage a process virtualization technique called SQLPAL. Inside SQLPAL, is a Win32 environment custom tailored to support SQL Server. Next, SQLPAL needs a way to talk to Linux so that it can access the physical resources of the system and it does this via something called the Host Extensions. Essentially the HE map SQLPAL’s Win32 and SQLOSv2 API calls to Linux system calls. In Linux system calls provide access to system resources, things like CPU, memory and network or disk I/O. And that’s the flow, SQLPAL, calls the Host Extensions, which call systems calls to interact with system resources.

Using strace to Collect System Calls

Knowing this, we can leverage Linux tools to see what SQL Server is doing when it interacts with the operating system. Specifically we can leverage tools like strace to see which systems calls it uses to perform certain tasks and in this case we’re going to look at how SQL on Linux implements Instance File Initialization.

It’s needless to say, do not reproduce this on a system that’s important to you

Attach strace to your currently running SQL Server process. Let’s find our SQL Server process

[root@server2 ~]# ps -aux | grep sqlservr

mssql      1414  3.0  0.4 198156 18244 ?        Ssl  06:23   0:04 /opt/mssql/bin/sqlservr

mssql      1416  6.3 15.9 1950768 616652 ?      Sl   06:23   0:08 /opt/mssql/bin/sqlservr

 

strace -t -f -p 1416 -o new_database.txt

Let’s walk through the strace parameters here, -t adds a time stamp, -f will attach strace to any threads forked from our traced process and -p is the process we want to trace .

Creating a Database with Instant File Initialization

With strace up and running let’s turn on the trace flags to enable output for Instant File Initialization and create database that has a 100MB data file and a 100MB log file. Check out this post from Microsoft for more details on the trace flags. This database create code is straight from their post. I changed the model database’s data and log file sizes to 100MB each. Also, it’s important to note Instance File Initialization is only for data files, log files are zeroed out due to requirements for crash recovery. We’re going to see that in action in a bit…
 

DBCC TRACEON(3004,3605,-1)

GO

 

CREATE DATABASE TestFileZero

GO

 

EXEC sp_readerrorlog

GO

 

DROP DATABASE TestFileZero

GO

 

DBCC TRACEOFF(3004,3605,-1)

 
Once the database creation is done, stop your strace and let’s go and check out the data gathered in the file. 

Poking Around in Linux Internals, Creating the MDF

Inside your output file you’re going to see a collection of system calls. In Linux a system call is the way a user space program can ask the kernel to do some work. And in this case SQL Server on Linux is asking the kernel to create a data file, create a log file and zero out the log file. So let’s check out what happens when the MDF is created.
 
1630  09:03:28.393592 open(“/var/opt/mssql/data/TestFileZero.mdf”, O_RDWR|O_CREAT|O_EXCL|O_DIRECT, 0660) = 154

First thing, that 1630, that’s the process ID of the thread that’s doing the work here. That PID is different than the one we attached strace to because it’s a thread created when we execute our database create statements.
 
Next you see a call to open, it’s opening the file TestFileZero.mdf. The next parameter are flags to tell open what to do, in this case O_RDWR opens the file for read/write access, O_CREAT creates a file, O_EXCL prevents open from overwriting the file being created, and O_DIRECT enables synchronous I/O to the file and disables the file system cache, 0660 is the file mode and the returne value is 164…this is the file descriptor for the file created. A file descriptor (fd) is used to represent and provide access to the file that was just opened. We’ll pass this to other system calls so they can interact with the file addressed by the fd.

1630  09:03:29.087471 fallocate(154, 0, 0, 104857600 <unfinished …> = 0

1630  09:03:29.087579 <… fallocate resumed> ) = 0


Next, we see a call to fallocate on the file descriptor 154, the first 0 is the mode, which tells fallocate to allocate disk space within the range specified in the third and forth parameters, offset and length respectively. And here is from 0 to 100MB. If you read the man page for fallocate, there is a FALLOC_FL_ZERO_RANGE flag that can be passed into the mode parameter. In this call, mode is set to 0 so this flag is not set. Further, the man page indicates that this flag is supported on XFS in Linux kernel 3.15. I’m on 3.10 using CentOS 7.4. So there’s no zeroing magic happing at the file system level.

1630  09:03:29.087603 ftruncate(154, 104857600) = 0


Next, there’s a call to ftruncate on fd 154. This call sets the length of the file to the parameter passed in this case 100MB.

1630  09:03:29.091223 io_submit(140030075985920, 1, [{data=0x185b4dc48, pwrite, fildes=154, str=”\1\v\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0c\0\0\0\2\0\374\37″…, nbytes=8192, offset=8192}]) = 1


Nex,t we’ll see a sequence of calls using io_submit, which submits asynchronous I/O to disk. The parameters for this one are an aio_context_t, the number of blocks in the request, then an array of AIO control blocks. The AIO control blocks are what’s inside the brackets [], the key parameters are pwrite  and filedes. That’s a write operation and the file descriptor which matches the fd that we’ve been working with, 154. These 17 write operations are laying out the mdf file, which has a header and other metadata through the file, but it’s certainly not zeroing the file out. We’ll see how zeroing works when we get to the LDF.
 

1630  09:03:29.098452 fsync(154 <unfinished …>

1630  09:03:29.098640 <… fsync resumed> ) = 0


The fsync call instructs the file descriptor to flush all buffers to disk. 
 

1630  09:03:29.099260 close(154)        = 0


The close call closes file file descriptor, releases any locks and allows the file to be reused by other processes. 

Poking Around in Linux Internals, Creating and Zeroing the LDF

So that was the creation of the data file, now let’s check out how the transaction log file is created. Now Instant File Initialization only applies to data files, transaction log files must be zeroed out for crash recovery. Let’s dig in. 
 

1630  09:03:29.831413 open(“/var/opt/mssql/data/TestFileZero_log.ldf”, O_RDWR|O_CREAT|O_EXCL|O_DIRECT, 0660) = 154

 
We see an open call again, nearly identical, and the file descriptor returned is again 154. 

1630  09:03:30.395757 fallocate(154, 0, 0, 104857600) = 0

 
There’s a call to fallocate to provision the underlying storage.

1630  09:03:30.395814 ftruncate(154, 104857600) = 0


Then we see a call to truncate again to ensure the size of the file is 100MB.

1630  09:03:30.396466 fsync(154 <unfinished …>

1630  09:03:30.397672 <… fsync resumed> ) = 0


Then there’s a call to fsync again, flushing buffers to disk.
 

1630  09:03:30.400042 write(1, “Z”, 1)  = 1

1630  09:03:30.400088 write(1, “e”, 1)  = 1

1630  09:03:30.400134 write(1, “r”, 1)  = 1

1630  09:03:30.400180 write(1, “o”, 1)  = 1

1630  09:03:30.400246 write(1, “i”, 1)  = 1

1630  09:03:30.400301 write(1, “n”, 1)  = 1

1630  09:03:30.400348 write(1, “g”, 1)  = 1

…output omitted


Now things get special…we see a series of write calls. This write call isn’t writing to the file…it’s writing to standard out, as indicated by the first parameter which is 1. 1 is the file descriptor for standard out.  The second parameter is the data to be written out, in this case you can see it’s a single character, the third parameter is the size of the data being written. The return value, that’s the last 1 on the line, that’s the number of bytes written to the file. And guess where this data is being sent too…the SQL Server Error log! See the string “Zeroing”?
Zeroing transaction log start
 

1630  09:03:30.406250 io_submit(140030075985920, 1, [{data=0x185b62308, pwritev, fildes=154, iovec=[{“\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300″…, 4096}…], offset=0}]) = 1

…Output omitted

1630  09:03:30.454831 io_submit(140030075985920, 1, [{data=0x185b4dc48, pwritev, fildes=154, iovec=[{“\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300″…, 4096}…], offset=100663296}]) = 1

 
Now for the log file, we’re going to see io_submit calls again, this time MANY more, from the output above I show the starting offset = 0 and an ending offset = 100663296, there’s several in between. If I did the math correctly, each io_submit writes 4MB of data, this last io_submit call is starting offset is at 96MB, plus that IO then we have our zeroed 100MB log file.
 
The difference in the time stamps between the first call and the start of the last call 48.581ms. strace’s time stamps are in microseconds. 
 

1630  09:03:30.460172 write(1, “Z”, 1)  = 1

1630  09:03:30.460215 write(1, “e”, 1)  = 1

1630  09:03:30.460259 write(1, “r”, 1)  = 1

1630  09:03:30.460315 write(1, “o”, 1)  = 1

1630  09:03:30.460361 write(1, “i”, 1)  = 1

1630  09:03:30.460406 write(1, “n”, 1)  = 1

1630  09:03:30.460450 write(1, “g”, 1)  = 1


When the io_submit calls are finished, we see a series of writes to standard out and the same data is in the SQL Error log and we see that zeroing is finished and it took 60ms…very close to what was reported by strace’s time stamps! There are additional calls from the printing of the start zeroing message to the stop zeroing message that are not included in my output here, plus the time that last IO takes to complete. 
 
Zeroring transaction log finish

1630  09:03:30.469088 fsync(154 <unfinished …>

1630  09:03:30.469532 <… fsync resumed> ) = 0


We call fsync to flush our buffers for this file. 

1630  09:03:30.469977 close(154)        = 0

 
Then we close the file.
 
In your strace data, you’ll see another sequence of zeroing in the file for zeroing the tail of the log, here’s the output from the SQL Error Log.

Zeroring tail of the log

And there we have it, this is how Instant File Initialization works on Linux and some Linux kernel internals!

Warning Handling in dbatools Automation Tasks

So I’ve been using dbatools for automated restore tasks and came across a SQL Server Agent job that I wrote that was reporting success but the job was actually failing.

What I found was the function I used, Restore-DbaDatabase, was not able to access the path that I was trying to restore databases from. The Restore-DbaDatabase function, and all dbatools functions according to the dbatools team on Slack, will throw a Warning rather than an Error by design.

When scheduling PowerShell scripts using dbatools in SQL Server’s Agent, we need use the SQL Agent Subsystem CmdExec so we can load in additional modules.  So we’ll have a SQL Agent job step that looks like this.

SQL Agent Job - cmdexec

 

Now, you see that line “Process exit code of a successful command” and it’s set to 0, we’ll that’s the first thing that I tested. I wanted to see if the warning generated by Restore-DbaDatabase returned a non-zero value…it didn’t it returns 0.  You can check this by checking %ERRORLEVEL% when running the PowerShell script defined in this job step’s command box at the command line.  

These scripts are very small, most only do one thing…restore a database. So I want them to report failure when something goes wrong, so how can we get that warning to cause the SQL Agent job to report failure?

We have to options here

Our first option is to adjust how our session handles warnings, we can do that with 

Doing this will cause the script to stop executing when it hits the warning and then the job will report failure.  

Our next option is to use the -Silent parameter on our Restore-DbaDatabase function call. The -Silent parameter cause the warnings in our script to report as errors. 

Both of these options cause the return value of our CmdExec subsystem’s call to the powershell.exe to return 1…which will cause our Agent job to report failure. This is exactly what I want!

One other thing I tested, both of these options cause the script to stop at the point of the error. When using -Silent, the function returns what it tried to do to standard output. When using $WarningPreference I did not get that output.

Thanks to Friedrich Weinmann and Shawn Melton for helping me sort this all out!

T-SQL Tuesday

Thanks to SQL DBA with A Beard for this event – https://sqldbawithabeard.com/2017/09/05/tsql2sday-94-lets-get-all-posh/

Speaking at PASS Summit 2017

I’m very pleased to announce that I will be speaking at PASS Summit 2017!  This is my first time speaking at PASS Summit and I’m very excited to be doing so! What’s more, is I get to help blaze new ground on a emerging technology SQL Server on Linux! My session is Monitoring Linux Performance for the SQL Server Admin so if you’re a Windows or SQL Server administrator, this session is for you. We’ll look at some of the internals of SQL Server on Linux and dive into Linux OS internals and show you where to look inside Linux for most important performance data for your SQL Server. I hope to see you there!

 

Monitoring Linux Performance for the SQL Server Admin

Abstract

So you’re a SQL Server administrator and you just installed SQL Server on Linux. It’s a whole new world. Don’t fear, it’s just an operating system. It has all the same components Windows has and in this session we’ll show you that. We will look at the Linux operating system architecture and show you where to look for the performance data you’re used to! Further we’ll dive into SQLPAL and how it architecture and internals enables high performance for your SQL Server. By the end of this session you’ll be ready to go back to the office and have a solid understanding of performance monitoring Linux systems and SQL on Linux. We’ll look at the core system components of CPU, Disk, Memory and Networking monitoring techniques for each and look some of the new tools available including new DMVs and DBFS.

 

PASS Summit 2017

Reflecting on the Last Year of Microsoft’s OpenSource Technologies

This past year has certainly been interesting in the world of Linux. Microsoft has taken a new strategy and is embracing the open source model. It’s releasing it’s key software products with versions for Linux. It’s truly a remarkable time. In this post I want to highlight some of the bigger events and cover what does this mean to you and where you can go do get some training on these topics.

Here’s some of the highlights from the last year

Microsoft becomes a Platinum Member in the Linux Foundation – this means Microsoft is committing itself to a long term investment in the Open Source community and continuing to develop open source software. Don’t believe me on the Open Source thing…well check out their GitHub repo. Who would have seen this coming? 

Now, let’s Look at the new tools you have to build cross platform applications and develop your systems

  • .NET Core – Literally you can build native .NET applications to run on any platform, Windows, Linux, Mac…Docker!
  • bash Ubuntu on Windows – One of the primary reasons I bought my first Mac years ago was I wanted a bash shell, well now I’m not tied to this hardware anymore. 
  • Visual Studio Code – With all this cross platform stuff, you’ll need a consistent development environment, VS Code runs on Windows, Linux and Mac. And it’s darn nice too. Very extensible with many languages available. 
  • SQL Server on Linux – This is the real deal, it’s fast and consistent with your existing SQL Server experience. I’ve blogged about it a bit :)
  • PowerShell Core – Microsoft adds another management tool to your tool belt with this. Windows, Linux and Mac…can be managed all with one Language. For me, this was mind blowing, I got to do a training video with literally the inventor of PowerShell Jeffrey Snover and MVP Jason Helmick! I blogged about PowerShell a bit too.
NewImage

What does this mean to you?

So what’s this mean to you? Get out there and start learning about this stuff and discover how it can impact you. In the coming years new solutions are going to be developed using these components and it’s upon you to train yourself and learn how to leverage these tools to solve problems. 

I’ve spent the last year developing some fun training at Pluralsight I think you should check out. The training is based on the Linux Foundation Certified Engineer curriculum and takes you from installation up to a running Linux system. 

  • Understanding and Using Essential Tools for Enterprise Linux 7 – If you’re new to Linux, start here! This will course will help you install Linux and get oriented with the operating system and the command line interface. 
  • LFCE: Advanced Network and System Administration – Next, you’ll need to learn how to control your system’s services, install packages, manage performance and share data between systems. Check this course out to make your Linux system really work for you
  • LFCE: Advanced Linux Networking – Your systems don’t stand alone, in this course you’ll dive deep into how data moves between Linux systems. Protip, these concepts apply to Windows systems too.
  • LCFE: Network and Host Security – My newest course, let’s learn how to secure our Linux systems from both the networking and host perspective. We’ll cover security concepts and architectures, securing Linux services and take a deep dive into OpenSSH and remote access.
  • LFCE: Linux Service Management – HTTP Services – I’m currently developing a course on HTTP Services – you’ll learn how to install, configure and manage Apache.
  • More to follow – announcements coming up soon! I can’t wait to tell you what’s next.

I’ve got tons of blog posts on these topics, 

So go ahead, get digging in there learn download Linux (yes, I prefer CentOS), install SQL Server and PowerShell and start moving your skills towards where the technology is going to take you!

Speaking at SQLSaturday Sacramento – 650!

Speaking at SQLSaturday Sacramento!

I’m proud to announce that I will be speaking at SQL Saturday Sacramento on July 15th 2017! And wow, 650 SQLSaturdays! This one won’t let you down. Check out the amazing schedule!

If you don’t know what SQLSaturday is, it’s a whole day of free SQL Server training available to you at no cost!

If you haven’t been to a SQLSaturday, what are you waiting for! Sign up now!

SQLSATSAC650

This year I have TWO sessions!

1. Linux OS Fundamentals for the SQL Admin

SQL Server and PowerShell are now available on Linux and management wants you to leverage this shift in technology to more effectively manage your systems, but you’re a Windows admin!  Don’t fear! It’s just an operating system! It has all the same components Windows has and in this session we’ll show you that. We will look at the Linux operating system architecture and show you how to interact with and manage Linux system. By the end of this session you’ll be ready to go back to the office and get started working with Linux with a fundamental understanding of how it works.

2. Designing High Availability Database Systems using AlwaysOn Availability Groups

Are you looking for a high availability solution for your business critical application? You’re heard about AlwaysOn Availability Groups and they seem like a good solution, but you don’t know where to start. It all starts with a solid design. In this session we introduce the core concepts needed to design a Availability Group based system. Covering topics such as recovery objectives, replica placement, failover requirements, synchronization models, quorum, backup and recovery and monitoring. This session is modeled after real world client engagements conducted by Centino Systems that have lead to many successful Availability Groups based systems supporting tier 1 business critical applications.

dbfs – command line access to SQL Server DMVs

With SQL Server on Linux, Microsoft has recognized that they’re opening up their products to a new set of users. People that aren’t used to Windows and it’s tools. In the Linux world we have a set of tools that work with our system performance data and present that to us as text. Specifically, the placeholder for nearly all of the Linux kernel’s performance and configuration data is the /proc virtual file system, procfs. Inside here you can find everything you need that represents the running state of your system. Processes, memory utilization, and disk performance data all of this is presented as files inside of directories inside /proc.

Now, let’s take this idea and extend it to SQL Server. In SQL Server we have DMVs, dynamic management views. These represent to current running state of our SQL Server. SQL Server exposes the data in DMVs as table data that we can query using T-SQL. 

So, Microsoft saw the need to bring these two things together, we can expose the internals of SQL Server and its DMVs to the command line via a virtual file system. And that’s exactly what dbfs does, it exposes all of SQL Server’s DMVs as text files in a directory. When you access one of the text files…you’ll execute query against the SQL Server and the query output comes back to you via standard output to you Linux console. From there you can use any of your Linux command line fu…and do what you want with the data returned. 

Setting up dbfs

So first, let’s go ahead and set this up. I already have the Microsoft SQL Server repo configured so I can install via yum. If you have SQL on Linux installed, you likely already have this repo too. If not, go ahead and follow the repo setup instructions here. To install dbfs we use yum on RHEL based distributions.

First off, think about what’s going on under the hood here…we’re going to allow the system to execute queries against DMVs…so let’s try to keep this as secure as possible, I’m going to create a user that is allowed to only query DMVs with the VIEW SERVER STATE permission. So let’s do that…
 
Let’s log into our SQL Server via SQLCMD
And execute this code to create a user named dbfs_user 

Once created, let’s assign this user permissions to query DMVs
The next step is we need to create a directory where dbfs will place all the files representing the DMVs we wish to query
Now, let’s go ahead and configure dbfs. I’m going to place it’s configuration file in /etc/ since that’s the standard location for configuration files on Linux systems.
And inside that file, let’s use the following configuration. Pretty straight forward. Define a configuration name, here you see server1, the hostname which is the locally installed SQL instance. We’ll use the username and password of the user we just created and also defined is a version. While this isn’t very well documented, the code here shows that if you’re on version 16 (SQL Server 2016) or newer it will create files dbfs files with a .json file extension which exposes your DMV data as…you guessed it JSON. Also if you want to add a second server to dbfs, just repeat the configuration inside the same text file.

Running dbfs

Now with all the preliminaries out of the way, let’s launch dbfs. Basic syntax here, the actual program name with the parameter -c pointing to the configuration file we just created and the -m parameter pointing to the directory we want to “mount” our DMVs into.
Now, what’s interesting about dbfs is if you log out dbfs stays running. Honestly, I don’t like that, if this is the case it should be running as a service managed by systemd or whatever init daemon you’re using on your Linux distribution. I mentioned that on their GitHub repo. If this is going to be a user process, then I should have the choice the background the task myself.

Using dbfs

Looking at the source for dbfs it gets a list of all DMVs from sys.system_views from the SQL Server you configured it to connect to, then creates a file for each and every one of those DMVs. So we have full coverage of all the DMVs available to us and since you can use any bash command line fu to access the data now…the options are really limitless. Microsoft has a few good demos on the GitHub repo here. Let’s walk through a few examples now.
 
Accessing a DMV

This is pretty straight forward, you read from the file just like you would read from any other file on a Linux system. So let’s do that…we add the column -t option to make sure all the columns are aligned in the output.

And our output looks like this…

Notice in the output above how the connect_time column is split incorrectly? We need to tell column to use the tab as a delimiter. By default it uses whitespaces. So let’s do that…
And now our output looks much better

Selecting off a subset of columns

Well you probably noticed that the output is a bit unruly since it’s outputting all of the DMV’s columns. So let’s tame that a bit and pull out particular columns. To do that we’ll use a tool called awk which will print out columns based on the numeric index, so $1 is the first column and so on. 
And our output looks like this
Something isn’t right…as DBAs we think of things in rows and columns. So we’re going to count across the top and think the 7th column is going to yield the 7th column and it’s data for each row, right? Well, it will but data processed by awk is whitespace delimited by default and is processed row by row. So the 7th column in the second line isn’t the same as the output in the first line. This can be really frustrating if your row data has spaces in it…like you know…dates.
 
So let’s fix that…the output from the DMVs via dbfs is tab delimited. We can define our delimiter for awk with -F which will allow for whitespaces in our data. Breaking the data only on the tabs. Let’s hope there isn’t any tabs in our data!
And the output from that looks like this, much better but we don’t have the nice columns.
We’re so close, we can’t throw column on the end to make this nice and columnar because awk with this configuration it will remove the tab delimiters on it’s output stream. column by default will do the same thing too, but we can let column do the work for us and have it print tab delimiters in it’s output stream. 
And voila, we end up with some nice neatly formatted output

Searching in Text

We can search for text in the output using grep, here’s a quick example looking for the dedicated admin connection in dm_os_schedulers
And here’s the output. 

SQL folks…keep in mind, grep will only output lines matched, so we loose the column headers here since they’re part of the standard output stream when accessing the file/DMV data.

Moving forward with dbfs

We need the ability to execute more complex queries from the command line. Vin Yu mentions this here. As DBAs we already have our scripts that we use day to day to help us access, and more importantly make sense of, the data in the DMVs. So dbfs should allow us to execute those scripts somehow. I’m thinking we can have it read a folder on the local Linux system at runtime, create files for those scripts and throw them in the mounted directory and allow them to be accesses like any of the other DMVs. The other option is we place those scripts as views on the server and access them via dbfs. Pros and cons either way. Since it’s open source…I’m thinking about implementing this myself :)

Next is, somehow we need the ability to maintain column context throughout the output stream, for DBAs it’s going to be tough sell having to deal with that. I know JSON is available, but we’re talking about DBAs and sysadmins here as a target audience. 

In closing is a great step forward…giving access into the DMVs from the command line opens up SQL Server to a set of people who are used to accessing performance data this way. Bravo! 

Speaking at SQL Saturday Pensacola!

I’m proud to announce that I will be speaking at SQL Saturday Pensacola on June 3rd 2017! Check out the amazing schedule!

If you don’t know what SQLSaturday is, it’s a whole day of free SQL Server training available to you at no cost!

If you haven’t been to a SQLSaturday, what are you waiting for! Sign up now!

My presentation is Designing High Availability Database Systems using AlwaysOn Availability Groups” 

Abstract:

Are you looking for a high availability solution for your business critical application? You’re heard about AlwaysOn Availability Groups and they seem like a good solution, but you don’t know where to start. It all starts with a solid design. In this session we introduce the core concepts needed to design a Availability Group based system. Covering topics such as recovery objectives, replica placement, failover requirements, synchronization models, quorum, backup and recovery and monitoring. This session is modeled after real world client engagements conducted by Centino Systems that have lead to many successful Availability Groups based systems supporting tier 1 business critical applications.

Learning Objectives: 

This session highlights the importance of doing thorough design work up front. Attendees will learn core concepts needed for successful Availability Group based systems. This includes, recovery objectives, replica placement, failover requirements, synchronization models, quorum, backup and recovery and monitoring. From this session attendees will have a firm footing on where to start when they start designing their AlwaysOn Availability Group based systems.

Why Did Your Availability Group Creation Fail?

Availability Groups are a fantastic way to provide high availability and disaster recovery for your databases, but it isn’t exactly the easiest thing in the world to pull off correctly. To do it right there’s a lot of planning and effort that goes into your Availability Group topology. The funny thing about AGs is as hard as they are to plan…they’re pretty easy to implement…but sometimes things can go wrong. In this post I’m going to show you how to look into things when creating your AGs fails.

When working at a customer site today I encountered and error that I haven’t seen before when creating an Availability Group. So I’m going to walk you through what happened and how I fixed it. So if your AGs fail at creation, you can follow this process to dig into why.

First, let’s try to create our Availability Group

But, that fails and we get this error…it tells me what happened and to go look in the SQL Server error log for more details.

OK, so let’s look in the SQL Server error Log and see what we find.

Clearly something is up, the AG tried to come online but couldn’t.

The error here say check out the Windows Server Failover Clustering log…so let’s go ahead and do that. But that’s not as straightforward as you think. WSFC does write to the event log, but the errors are pretty generic for this issue. Here’s what you’ll see in the System Event Log and the Cluster Events section in the Failover Cluster Manager

Wow, that’s informative, right? Luckily we still have more information to look into.

Let’s dig deeper with using the WSFC cluster logs

The cluster logs need to be queried, they’re not readily available as text for us. We can write them out to file with this PowerShell cmdlet Get-ClusterLogs. Let’s make a directory and dump the logs into there.

Now we have some data to look through!

When we look at the contents of the cluster logs files generates by Get-ClusterLogs, we’re totally on the other side of the spectrum when it comes to information verbosity. The logs so far have been pretty terse and haven’t really told us about what’s causing the failure…well dig through this log and you’ll likely find your reason and a lot more information. Good stuff to look at to get an understanding of the internals of WSFCs. Now for the the reason my Availability Group creation failed was permissions. Check out the log entries.

Well that’s pretty clear about what’s going on…the process creating the AG couldn’t connect to SQL Server to run the very important sp_server_diagnostics stored procedure. A quick internet search to find a fix yielded this article from Mike Fal (b | t) which points to this Microsoft article detailing the issue and fix.

For those that don’t want to click the links here’s the code to adjust the permissions and allow your Availability Group to create.

So to review…here’s how we found our issue.

  1. Read the error the create script gives you
  2. Read the SQL Server error log
  3. Look at your System Event log
  4. Dump your Cluster Logs and review

Use this technique if you find yourself in a situation where your AG won’t come online or worse…fails over unexpectedly or won’t come back online.

Using dbatools for automated restore and CHECKDB

OK, so if you haven’t heard of the dbatools.io project run by Chrissy LeMaire and company…you’ve likely been living under a rock. I strongly encourage you to check it out ASAP. What they’re doing will make your life as a DBA easier…immediately. Here’s an example…

One of the things I like to do as a DBA is backup my databases, restore them to another server and run CHECKDB on them. There are some cmdlets in the dbatools project, in particular the Snowball release, that really make this easy. In this post I’m going to outline a quick solution I had to throw together this week to help me achieve this goal. We’ve all likely written code to do this using any number of technologies and techniques…wait until you see how easy it is using the dbatools project.

Requirements

  1. Automation – Complete autopilot, no human interaction.
  2. Report job status – Accurate reporting in the event the job failed, the CHECKDB failed or the restore failed.

Solution

  1. Use dbaltools cmdlets for restore and CHECKDB operations
  2. Use SQL Agent Job automation, logging and alerting

So let’s walk through this implementation together.

Up first, here’s the PowerShell script used to restore and CHECKDB the database. Save this code into a file named restore_databses.ps1

Let’s what through what’s going on here. First the line with $ErrorActionPreference = “Stop” that’s crucial because it will tell our script to stop when it encounters and error. Yes, that’s what I want. The job stops and the error from the cmdlets will reach the SQL Agent job we have driving the process. Using this, the job will fail, and I’ll have a nice log telling me exactly what happened.

Next we have some variables set, including the backup path and the location of the data and log files on the destination system.

Now, here’s the Restore-DbaDatabase cmdlet from the dbatools project, this cmdlet will traverse the backup path defined in -Path parameter, find all the backups and build the restore sequence for you. Yes…really! If you don’t define a parameter defining a point in time it will build a restore sequence using the most recent backups available in the share. The next few parameters define the destination data and log directories and tell the restore to overwrite the database if the database exists on the destination server. That next parameter tells the job to ignore using log backups. This is sufficient in my implementation because I’m running full backups daily, I don’t need the point in time recovery. You might, so give it a try. CHECKDB can take a long time…the final parameter, tells Invoke-SqlCmd2 not to timeout while running its query.

Now, I need to run some T-SQL to clean up the databases, for example, I change the recovery model, then shrink the log. This is so I don’t have a bunch of production sized log files laying around on the destination system I do this after each restore, this way I can save a little space. And finally, I run CHECKDB against the database.

If you want to do this for more than one database, you could easily parameterize this code and drive the process with a loop. You’re creative…give it a try.

Now, I take all this and wrap it up in a SQL Agent job.

SQL Agent Job Step

 Figure 1: SQL Agent Job Step Definition

Using a SQL Agent job, we get automation, reporting and alerting. I’ll know average run times, if the job fails and have a log of why and it sends me an email with the job’s results.

The SQL Agent job type is set to Operating system (CmdExec), rather than PowerShell. We run the job this way because we want to use the latest version of PowerShell installed on our system. In this case its version 5.1. The SQL Agent PowerShell job step on SQL 2012 I believe uses version 4 and when I used it, it wasn’t able to load the dbatools modules.

We need to ensure we install the dbatools as administrator. This way the module is available to everyone on the system, including the SQL Agent user, not just the user installing the module. Simply run a PowerShell session as administrator and use Install-Module dbatools. If you need more assistance check out this for help.

From a testing standpoint I confirmed the following things…

  1. When a restore fails, it’s logged to the SQL Agent job’s log, I get an alert.
  2. When one of the Invoke-SqlCmd2 calls fails, it’s logged to the SQL Agent job’s log and I get an alert.
  3. When CHECKDB finds a corruption in a database, it’s logged to the SQL Agent job’s log, the SQL Server Error Log and I get an alert. For testing this I used Paul Randal’s corrupt databases which he has available here.

So in this post, we discussed a solution to common DBA problem, backup, restore and CHECKDB a set of databases. Using dbatools, you can do this with a very simple solution like I described here. I like simple. Simple is easier to maintain. Certainly there are some features I want to add to this. Specifically, I’d like to write some more verbose information into the SQL Agent job’s log or use the job step’s ability to log to a file. Using those logs I can easily review the exact runtimes of each restore and CHECKDB.

Give dbatools a try. You won’t be disappointed…really go there now!

Speaking at SQLSaturday Chicago – 600!

Speaking at SQLSaturday Chicago!

I’m proud to announce that I will be speaking at SQL Saturday Chicago on March 11th 2017! And wow, 600 SQLSaturdays! This one won’t let you down. Check out the amazing schedule!

If you don’t know what SQLSaturday is, it’s a whole day of free SQL Server training available to you at no cost!

If you haven’t been to a SQLSaturday, what are you waiting for! Sign up now!

My presentation is Networking Internals for the SQL Server Professional” 

NewImage

Here’s the abstract for the talk

Once data leaves your SQL Server do you know what happens or is the world of networking a black box to you? Would you like to know how data is packaged up and transmitted to other systems and what to do when things go wrong?  Are  you tired of being frustrated with the network team? In this session we introduce how data moves between systems on networks and TCP/IP internals. We’ll discuss real world scenarios showing you how your network’s performance impacts the performance of your SQL Server and even your recovery objectives.