Author Archives: Anthony Nocentino

Top 5 Reasons Why I think SQL Server on Linux is Legit

Here are my top 5 reasons why I thing SQL Server on Linux is Legit!

  1. SQL Server on Linux is Fast – Earlier this year SQL Server on Linux posted the fasted 1TB TPC-H benchmark in the world and at the end of October posted the a 10TB result! Check out the results here and some info on how they did it here and here.
  2. It’s tunable – From and OS standpoint, I think the “tunability” of the operating system is more well documented and well known on Linux. Check out Microsoft’s recommendations here and also Redhat’s here.
  3. Features – If you’re a developer in the Linux ecosphere, this is the reason why you’re evaluating using SQL Server on Linux…there’s likely a feature you want…that you can now have. Check them out here!
  4. Enterprise Support – I like cruising around in forums just like anybody else, but sometimes you have to call support to bring in the people that actually wrote the software.
  5. Availability Solutions – SQL Server has a proven track record for availability, those same concepts and techniques apply to SQL Server on Linux. Backups, Availability Groups, and Failover Clusters check it out.
SQL Server on Linux is a new world, learn the operating system, administration fundamentals and more with me at my online course “SQL Server on Linux: Install, Configure, and Manage” at Brent Ozar Unlimited – sign up now here!

Launching SQL Server on Linux in Single User Mode

There was a question this morning on the SQL Server Community Slack channel from SvenLowry about how to launch SQL Server on Linux in Single User Mode. Well you’ve heard everyone say, it’s just SQL Server…and that’s certainly true and this is another example of that idea.

The command line parameters from the sqlservr binary are passed through into the SQLPAL managed Win32 SQL Process. So let’s check out how to do this together…

First, you’ll want to switch to the user mssql and you can do that with this command

bash-4.2$ sudo su mssql –


What’s happening here is we’re using sudo to switch our user’s security context to the user mssql. This is the account that SQL Server normally runs under. That last dash there is to load the mssql user’s shell, rather than ours.
 
Next, we need to launch the /opt/mssql/bin/sqlservr binary with the -m parameter
 

bash-4.2$ /opt/mssql/bin/sqlservr -m 


Here’s the output from the console while SQL Server is starting up. 

2017-11-09 12:53:18.70 Server      Microsoft SQL Server 2017 (RTM-CU1) (KB4038634) – 14.0.3006.16 (X64) 

Oct 19 2017 02:42:29 

Copyright (C) 2017 Microsoft Corporation

Developer Edition (64-bit) on Linux (CentOS Linux 7 (Core))

2017-11-09 12:53:18.70 Server      UTC adjustment: -6:00

2017-11-09 12:53:18.70 Server      (c) Microsoft Corporation.

2017-11-09 12:53:18.70 Server      All rights reserved.

2017-11-09 12:53:18.70 Server      Server process ID is 4120.

2017-11-09 12:53:18.70 Server      Logging SQL Server messages in file ‘/var/opt/mssql/log/errorlog’.

2017-11-09 12:53:18.70 Server      Registry startup parameters: 

-d /var/opt/mssql/data/master.mdf

-l /var/opt/mssql/data/mastlog.ldf

-e /var/opt/mssql/log/errorlog

2017-11-09 12:53:18.70 Server      Command Line Startup Parameters:

-m

Output omitted…

2017-11-09 12:53:19.50 spid4s      SQL Server started in single-user mode. This an informational message only. No user action is required.

Output omitted…

2017-11-09 12:53:19.90 spid4s      Always On Availability Groups was not started because the SQL Server instance is running in single-user mode.  This is an informational message.  No user action is required.

Output omitted…

2017-11-09 12:53:20.62 spid4s      Recovery is complete. This is an informational message only. No user action is required.


From here, you can go about what ever task it is you needed single user mode for.
 
Administering SQL Server on Linux is a new world, learn the operating system, administration fundamentals and more with me at my online course “SQL Server on Linux: Install, Configure, and Manage” at Brent Ozar Unlimited – sign up now here!

SQL Server on Linux Online Training with Brent Ozar Unlimited!

I’m proud to announce to you that I’m partnering with the folks over a Brent Ozar Unlimited to provide live, instructor-led training directly to you online!

Course – SQL Server on Linux: Install, Configure, and Manage

Description

You’ve been managing SQL Server configuration, backups, and troubleshooting for a couple of years, but that was on Windows. Now, you’re looking at implementing SQL Server 2017 on Linux.

Things are different on Linux.

You want to learn enough Linux to be a good DBA – to install, configure, monitor, and troubleshoot SQL Server on Linux.

I’ve been using Linux in production since y2k was a thing, and I can help. I’m Anthony Nocentino, a Microsoft Data Platform MVP. It feels like I’ve been training for this all my life, and I can finally combine my nearly two decades of Linux experience with my SQL Server expertise to get you up to speed, fast!

In this workshop we’ll dive into SQL on Linux and get things started

  • Linux Fundamentals for DBAs
  • SQL Server on Linux Installation and Configuration
  • SQL on Linux Fundamentals such as SQLPAL and SQLOSv2
  • Administering and Configuring SQL Server on Linux
  • Configuration Best Practices
  • High Availability and Disaster Recovery Options

This is an online class – I’m hosting it in GoToWebinar, live on webcam, taking questions as we go through the material together.

Head on over and sign up now!

ANTHONYfinal1

Figure 1: Not joining A-ha! But hoping you have lots of “a-ha” moments during training.

New Pluralsight Course – SQL Server on Linux Administration Fundamentals

New Pluralsight Course – SQL Server on Linux Administration Fundamentals

My new course “SQL Server on Linux Administration Fundamentals” in now available on Pluralsight here! If you want to learn about the course, check out the trailer here or if you want to dive right in check it out here!
 
This course targets DBAs that design and maintain SQL Server on Linux systems (or those evaluating the technology). This course can be used by both the seasoned DBA to learn foundational Linux skills and also what’s new and different when running SQL Server on Linux. 

Course Description

SQL Server is available on Linux, and management wants you to leverage this shift in technology to more effectively manage your data platform. In this course, SQL Server on Linux Administration Fundamentals, you’ll delve into SQL on Linux in order for you to become an effective DBA.  First, you’ll explore an overview of its architecture, installation, and configuration. Next, you’ll learn how to administer SQL Server on Linux. Finally, you’ll discover high availability and disaster recovery options available to you for keeping your SQL Server online. By the end of this course, you’ll have a solid foundation necessary to utilize SQL Server on Linux in production.

The modules of the course are:

  • Introduction and SQL Server Architecture – Introduce the viewer into world of SQL Server on Linux. Why did Microsoft do this? What’s the strategy? Introduce the SQL Server Ecosphere, such as the database engine, SQL Server Agent and SSIS.
  • Installing and Configuring SQL Server on Linux – We’ll look at our installation and configuration options for SQL Server on Linux, introducing Linux package managers and repositories and install SQL Server on Linux and it’s components.
  • Administering Linux for DBAs – We’ll look at managing services with systemd and how to query journald’s log files for information about SQL Server.  Also dive into file ownership, disk partitioning concepts and mounting file systems and remote file systems.
  • Managing SQL Server on Linux: Administration and Tools – Now that the viewer knows where things are in this new operating system, let’s move up the stack and look at the tooling available for SQL Server on Linux. We’ll cover VS Code, SSMS, SQLCMD and DBFS.
  • High Availability and Disaster Recovery with SQL Server on Linux – Dive into the High Availability and Disaster Recovery options available to SQL Server on Linux
     

Pluralsight Redhat Linux

Check out the course at Pluralsight!

Exit Codes, systemd and SQL Server on Linux

In this blog post we’re going to cover systemd, process exit codes and highlight how systemd reacts in certain exit conditions from SQL Server on Linux. My friend and SQL Server guru Argenis Fernandez – @dbargenis asked about this behavior on Twitter and I’ve been meaning to write this post, so here you go! Also, there’s a Connect item filed by Argenis on this here. Vote!

systemd Basics

Systemd is an initialization daemon, it’s job is to bring the system to usable state. Meaning, it’s responsible for the orderly starting of services on a Linux system. It does much more than that, in fact, one of it’s other core components is journald. Journald stores logging information from systemd units. 

Now with respect to SQL Server on Linux, the information logged into journald is systemd’s information about our mssql.service unit and also the SQL Server Error Log. The SQL Server Error Log lands in here because it’s written to standard out and by default, a service unit’s standard out is written into the journal. 

We can query the information stored in journald with the command journalctl and below is the syntax to query a particular service unit’s log in journald. Running just journalctl will cause the text output to run off the side of the screen without wrapping the text in your terminal. To get the text to wrap, use the following code…then pipe the output into a pager like more or less because you know…less is more. We’ll use this command in the next section.

journalctl -u mssql-server –no-pager | more

Service units in systemd are configured in unit files and SQL Server on Linux’s unit file lives in the file /usr/lib/systemd/system/mssql-server.service. Inside that file we have the following configuration:

# Restart on non-successful exits.

Restart=on-failure

What Restart=on-failure option does, is if a systemd unit returns an unclean exit code, systemd will automatically restart the service. There are other conditions that cause it to restart such as responding to unclean signals, watchdog and also service timeouts. We’ll save those discussions for another day.

Understanding Process Exit Codes

When a process exits, it will return an integer value to the parent process. If a process terminates cleanly, it conventionally returns 0 to the parent process. This means all is well in the world and the process has shut down. In a relational database system this is significant, in that, we really do want our systems to shut down cleanly. Now when things go wrong, that’s when process return a non-zero value.  systemd is the parent process of the sqlservr process and receives it’s exit code. Let’s see what happens in certain exit conditions from SQL Server on Linux.

Let’s say we initiate a shutdown in SQL Server the T-SQL command SHUTDOWN here’s what we’ll get in journald. 

Oct 28 05:27:58 sqlb.lab.centinosystems.com sqlservr[1103]: 2017-10-28 05:27:58.37 spid51      Server shut down by request from login sa.

Oct 28 05:27:58 sqlb.lab.centinosystems.com sqlservr[1103]: 2017-10-28 05:27:58.37 spid51      SQL Trace was stopped due to server shutdown. Trace ID = ‘1’. This is an informational message only; no user action is required.

Oct 28 05:28:00 sqlb.lab.centinosystems.com systemd[1]: mssql-server.service: main process exited, code=exited, status=1/FAILURE

Oct 28 05:28:00 sqlb.lab.centinosystems.com systemd[1]: Unit mssql-server.service entered failed state.

Oct 28 05:28:00 sqlb.lab.centinosystems.com systemd[1]: mssql-server.service failed.

Oct 28 05:28:00 sqlb.lab.centinosystems.com systemd[1]: mssql-server.service holdoff time over, scheduling restart.

Oct 28 05:28:00 sqlb.lab.centinosystems.com systemd[1]: Started Microsoft SQL Server Database Engine.

Oct 28 05:28:00 sqlb.lab.centinosystems.com systemd[1]: Starting Microsoft SQL Server Database Engine…

Oct 28 05:28:02 sqlb.lab.centinosystems.com sqlservr[1822]: 2017-10-28 05:28:02.87 Server      Microsoft SQL Server 2017 (RTM-CU1) (KB4038634) – 14.0.3006.16 (X64)

 
Now in the output above, you’ll notice a bolded line. In there, you can system that systemd[1] receives a return code from SQL Server of status=1/FAILURE.  Systemd[1] is the parent process to sqlservr, in fact it’s the parent to all processes on our system. It receives the exit code and immediately, systemd initiates a restart of the service due to the configuration we have for our mssql-server systemd unit.
 
What’s interesting is that this happens even on a normal shutdown. But that simply doesn’t make sense, return values on clean exits should return 0. It’s my understanding of the SHUTDOWN command, that it will cause the database engine to shutdown cleanly. 
 
Now let’s say we initiate a SHUTDOWN WITH NOWAIT
 

Oct 28 05:31:24 sqlb.lab.centinosystems.com sqlservr[1822]: 2017-10-28 05:31:24.22 spid51      Server shut down by NOWAIT request from login sa.

Oct 28 05:31:24 sqlb.lab.centinosystems.com sqlservr[1822]: 2017-10-28 05:31:24.22 spid51      SQL Trace was stopped due to server shutdown. Trace ID = ‘1’. This is an informational message only; no user action is required.

Oct 28 05:31:25 sqlb.lab.centinosystems.com systemd[1]: mssql-server.service: main process exited, code=exited, status=1/FAILURE

Oct 28 05:31:25 sqlb.lab.centinosystems.com systemd[1]: Unit mssql-server.service entered failed state.

Oct 28 05:31:25 sqlb.lab.centinosystems.com systemd[1]: mssql-server.service failed.

Oct 28 05:31:25 sqlb.lab.centinosystems.com systemd[1]: mssql-server.service holdoff time over, scheduling restart.

Oct 28 05:31:25 sqlb.lab.centinosystems.com systemd[1]: Started Microsoft SQL Server Database Engine.

Oct 28 05:31:25 sqlb.lab.centinosystems.com systemd[1]: Starting Microsoft SQL Server Database Engine…

Oct 28 05:31:27 sqlb.lab.centinosystems.com sqlservr[2035]: 2017-10-28 05:31:27.64 Server      Microsoft SQL Server 2017 (RTM-CU1) (KB4038634) – 14.0.3006.16 (X64)


In this output, we see the same behavior, SQL Server returns a status=1/FAILURE and restarts. This makes sense to me, an immediate shutdown should return a non-zero value and it does. 
 
In this post we covered, systemd units and their restart configurations. We looked at exit codes and how they’re used to communicate back to the parent process about the child’s exit status. While it’s just SQL Server…as DBAs we still need to learn about this new operating environment and how it works! Keep learning!
 
For more details on this behavior and it’s configuration I encourage you to read the documentation at this link here. Further, if you’d like a deep dive into systemd check out my Pluralsight course LFCE: Advanced Network and System Administration In this course I cover systemd in great detail, it’s architecture, units and target. 

SQL Server on Linux Content at PASS Summit

PASS Summit is right around the corner and I’ll be there speaking on Monitoring Linux Performance for the SQL Server Admin!

There’s a fantastic amount of SQL Server on Linux content available at Summit. I encourage you to attend one of these sessions. You’ll likely find me at all of these!

Wednesday – 11/1

Thursday – 11/2

Friday – 11/3

 

 

NewImageSummit2017 275x50

SQL Server on Linux – External Memory Pressure

In this blog post we’re going to explore how SQL Server on Linux responds to external memory pressure. On Windows based SQL Server systems we’ve become accustomed to the OS signaling to SQL Server that there’s a memory shortage. When signaled, SQL Server will kindly start shrinking it’s memory caches, including the buffer pool, to maintain overall system stability and usability. We’ll that story is a little different in SQL Server on Linux…let’s look and see how SQL Server on Linux responds to external memory pressure

The Scenario

Our SQL Server on Linux system is a CentOS 7.4 VM, running SQL Server 2017 (RTM). There is 12GB of physical memory and 80% of that is dedicated to SQL Server via mssql-conf about 9.2GB, the default configuration. I’ve also set Max Server memory to 8GB.

So let’s start up SQL Server on Linux and query the buffer pool cache sizes when there’s no data loaded. I’m going to look in two places. sys.dm_os_memory_clerks and also sys.dm_os_buffer_descriptors.  In Figures 1 and 2, we can get a feel for the memory distribution across the system and also inside the buffer pool.

Screen Shot 2017 10 19 at 2 01 04 PM

Figure 1 – Top memory consumers at system startup

Buffer Pool - Cold

 Figure 2 – Buffer Pool memory distribution at system startup

Now, I’m going to load a 8GB table into the buffer pool, maxing out SQL Server’s MAX memory instance setting. Now we’ll look at the allocations again. In Figures 3 and 4, we can see the buffer pool is the dominant consumer of memory and that our database is the dominant consumer in the buffer pool. So far so good, eh?

Memory Clerks - Warm

 Figure 3 – Top memory consumers after server reached maximum memory

Buffer Pool - Warm

 Figure 4 – Buffer Pool memory distribution after table loaded.

Getting Memory Information From Linux

We can use tools like ps, top and htop to look our are virtual and physical memory allocations. We can also look in the /proc virtual file system for our process and look at the status file. In here we’ll find the point in time status of a process, and most importantly the types of memory allocations for a process. We’ll get granular data on the virtual memory allocations and also the resident set size of the process. Here are the interesting values in the status file we’re going to focus on today.

  • VmSize – total current virtual address space of the process
  • VmRSS – total amount of physical memory currently allocated to the process
  • VmSwap – total amount of virtual memory currently paged out to the swap file (disk)

We can use the command pidof sqlservr to find the process ID for our SQL on Linux process, in our case it’s 14689. When running SQL Server on Linux there is always two processes for sqlservr, we’re going to look the one with the higher process ID.

Now to access the status information for our process we look in /proc/14689/status

Below is the output from our process’ status file, filtering for the values we’re interested in. You can see about 10.16GB of Virtual Memory (VmSize), 7.82GB of which is in actual RAM (VmRSS) and 622MB in the swap file (VmSwap). Nothing special to see here, just the baseline information for our process with a populated buffer pool.

VmSize: 10661380 kB

VmRSS:   8201240 kB

VmSwap:   636952 kB

Let’s Add Some Pressure

Using a small C program I wrote, I use calloc to allocate large sections of memory and then I have the program continually write data into the memory buffer to ensure those pages stay in physical memory. Using this program, let’s allocate 4GB of RAM. I’m choosing this value because it’s going to cause my system to exceed it’s physical memory but NOT exceed the amount virtual memory in my system when including the swap file’s size. Things can get really dicey if you exceed physical and virtual memory size, processes will start to get killed. Microsoft documents a case here.

Let’s look at the memory data from /proc/17141/status for my memory allocator program. In the output below you can see we have just about 4GB of virtual memory (VmSize) allocated and nearly the same amount of memory that’s resident in physical memory (VmRSS) and no data in the swap file.

VmSize:  4198476 kB

VmRSS:   4194484 kB

VmSwap:        0 kB

 

Now that 4GB program has caused Linux to need to make room in physical memory for this process. It does this by swapping least recently used pages from memory out to disk. So under memory pressure, let’s look at the SQL Server process’ memory allocations according to Linux. In the output below we see we still have a VmSize of around 10GB, but our VmRSS value has decreased dramatically. In fact, our VmRSS is now only 3.53GB. It decreased by 4.29GB!  VmSwap has increased to 5.04GB. Wow, that’s a huge portion of the process swapped to disk. 

VmSize: 10843320 kB

VmRSS:   3705476 kB

VmSwap:  5289752 kB

What Does SQL Server Think About All of This?

With our memory allocation program running and no workload running inside SQL Server, let’s look at sys.dm_os_memory_clerks and sys.dm_os_buffer_descriptors again. Something interesting happens, in Figures 5 and 6, when we look at sys.dm_os_memory_clerks and sys.dm_os_buffer_descriptors things changed but only slightly  But from the data above we do know that only 3.54GB of the process is actually in memory (VmRSS) at this point. Which means a large portion of SQL Server’s process is actually NOT in memory anymore…they’re on disk. The reality is, we simply do not know which portions of the process are in memory or on disk at this level.  You could use tools like pmap to determine which sections of memory are on disk, but we’ll leave that for another day. Try it out…it’s fun ;)

Swapped out? Why not?

Figure 5 – Top memory consumers after server reached maximum memory and with a significant portion of the process swapped to disk. Just about the same as it was without external memory pressure.

Buffer Pool - Warm but paged

Figure 6 – Buffer Pool memory distribution after table loaded and with a significant portion of the process swapped to disk. Just about the same as it was without external memory pressure.

Things aren’t what they seem

Looking at wait stats during query execution is going to be a little more interesting when SQL Server thinks it’s data pages are in memory, but a significant portion of them are actually on disk. In fact, when querying my test table with a simple SELECT COUNT(*) the wait types don’t indicate the fact that SQL Server is now actually disk bound due to the large portions of the buffer pool being paged out to disk. Under normal conditions, this query takes 11 seconds to run…when we’re paging to disk, it takes 2 minutes and 11 seconds to run. 

So let’s check out the wait types reported during our query output here in Figure 7. We see SOS_SCHEDULER_YIELD…which makes sense…long running query, that’s pretty slow and it’s quantum is being exhausted and the thread is being rescheduled. We do see some disk IO waits with PAGEIOLATCH_SH, but not much…what’s happening here is my data table is 8GB and my Max server memory is also 8GB so there’s a little overflow there and will read from disk for some of the data. From a performance standpoint, SQL Server has no idea that it’s data pages are likely swapped out to in this low memory situation. It thinks those pages are in the buffer pool…they certainly are…but a large chunk of the buffer pool is now actually swapped out to disk. 

Wait stats, who needs wait stats?

Figure 7 – Wait types reported during query execution with a significant portion of the process swapped to disk. Wait stats query from @brento 

What does this all mean?

Well, SQL Server is on a new operating system now, Linux. It’s upon us as DBAs to know how our system works as a whole and what to do when we’re in situations where performance falls off a cliff due to memory contention. This case is certainly a contrived case by putting so much external pressure on our process…but you, the DBA, need to know how SQL Server on Linux is going to react in these conditions and how to get your system back on track when things go south. Good monitoring and also good system resource management are needed. Certainly a properly size system in terms of physical memory and swap file size is needed. We toed the line here in our example ensuring we didn’t exceed physical memory plus swap file size. If you do exceed physical and swap size memory allocations will fail and your process will likely crash or be killed.

Speaking of monitoring, I’m speaking at PASS Summit this year on…you guessed “Monitoring Linux Performance for the SQL Server Admin” on Friday November 3th at 8am. If you want to learn more, I’ll see you there. 

Speaking at IT/Dev Connections 2017 – San Francisco!

I’m proud to announce that I will be delivering two sessions at IT/Dev Connections in San Francisco! This is my second year at IT/Dev Connections, real content for IT pros!

Networking Internals for the SQL Server Professional

Tuesday, 10/24/2017: 1:15 pm – 2:30 pm

Room: Contiental 7

Abstract

Once data leaves your SQL Server do you know what happens or is the world of networking a black box to you? Would you like to know how data is packaged up and transmitted to other systems and what to do when things go wrong? Are you tired of being frustrated with the network team? In this session we introduce how data moves between systems on networks and TCP/IP internals. We’ll discuss real world scenarios showing you how your network’s performance impacts the performance of your SQL Server and even your recovery objectives.

Learning Objectives

Networking Fundamentals, TCP/IP, Network Performance and how these impact SQL Server.

Linux OS Fundamentals for the SQL Admin

Wednesday, 10/25/2017: 1:15 pm – 2:30 pm

Abstract

PowerShell and SQL Server are now available on Linux and management wants you to leverage this shift in technology to more effectively manage your systems, but you’re a Windows admin, Don’t fear! It’s just an operating system. It has all the same components Windows has and in this session we’ll show you that. We will look at the Linux operating system architecture and show you how to interact with and manage a Linux system. By the end of this session you’ll be ready to go back to the office and get started working with Linux.

Learning Objectives

In this session we’ll cover the following  

  • Process control
  • Service control
  • Package installation
  • System resource management (CPU, disk and memory)
  • Using PowerShell to interact with Linux systems

IT/Dev Connections

New Pluralsight Course – LFCE: Linux Service Management – HTTP Services

My new course “LFCE: Linux Service Management – HTTP Services” in now available on Pluralsight here! If you want to learn about the course, check out the trailer here or if you want to dive right in check it out here!

This course targets IT professionals that design and maintain RHEL/CentOS based enterprises. It aligns with the Linux Foundation Certified System Administrator (LFCS) and Linux Foundation Certified Engineer (LFCE) and also Redhat’s RHCSA and RHCE certifications. The course can be used by both the IT pro learning new skills and the senior system administrator preparing for the certification exam

Let’s take your LINUX sysadmin skills to the next level and get you started on your LFCS/LFCE learning path.

The modules of the course are:

  • Internet Architecture and Domain Name System – Overview of the Internet, the World Wide Web and DNS
  • Installing and Configuring HTTP Service with Apache – Let’s get Apache installed and configured and review some common configuration use cases
  • Implementing Security Services with Apache – Overview of encryption and digital certificates, configuring HTTPS services and HTTP access controls
  • Troubleshooting and Logging in Apache – Apache logging architecture, configuring logging, log formats and accessing log data

Pluralsight Redhat Linux

Check out the course at Pluralsight!

Instant File Initialization in SQL Server on Linux

Earlier this week Ned Otter (@NedOtter) brought up a question about Instant File Initialization on SQL Server on Linux, check out the thread here. I was up way too early in the morning, as I normally am, so I decided to poke around and see how it was done. SQL Server pros, here you can see you can get some deep internal information from the OS very easily. Hopefully with this blog post you’ll be able to compare how this is done on Windows and draw the connections between the two platforms and leverage this technique in other areas.

Let’s check it out…

SQL on Linux Internals

First, the internals of SQL Server on Linux leverage a process virtualization technique called SQLPAL. Inside SQLPAL, is a Win32 environment custom tailored to support SQL Server. Next, SQLPAL needs a way to talk to Linux so that it can access the physical resources of the system and it does this via something called the Host Extensions. Essentially the HE map SQLPAL’s Win32 and SQLOSv2 API calls to Linux system calls. In Linux, system calls provide access to system resources, things like CPU, memory and network or disk I/O. And that’s the flow, SQL Server calls SQLPAL, which calls the Host Extensions, which calls Linux’s system calls to interact with system resources.

Obligatory SQLPAL Image

Figure 1: SQLPAL

Using strace to Collect System Calls

Knowing this, we can leverage Linux tools to see what SQL Server is doing when it interacts with the operating system. Specifically we can leverage tools like strace to see which systems calls it uses to perform certain tasks and in this case we’re going to look at how SQL on Linux implements Instance File Initialization. So let’s attach strace to our SQL Server on Linux process, sqlservr.

It’s needless to say, do not reproduce this on a system that’s important to you. Tracing will SLOW things down.

Attach strace to your currently running SQL Server process (sqlservr). So let’s start with finding our SQL Server process with ps. There is always two (insert Sith reference) you’re going to want the child process here. Easily identified by the one with the higher process ID. So we’ll pick 1416.

[root@server2 ~]# ps -aux | grep sqlservr

mssql      1414  3.0  0.4 198156 18244 ?        Ssl  06:23   0:04 /opt/mssql/bin/sqlservr

mssql      1416  6.3 15.9 1950768 616652 ?      Sl   06:23   0:08 /opt/mssql/bin/sqlservr

Let’s walk through the strace parameters here, -t adds a time stamp, -f will attach strace to any threads forked from our traced process and -p is the process we want to trace .

strace -t -f -p 1416 -o new_database.txt

Creating a Database with Instant File Initialization

With strace up and running let’s turn on the trace flags to enable output for Instant File Initialization and create database that has a 100MB data file and a 100MB log file. Check out this post from Microsoft for more details on the trace flags. This database create code is straight from their post. I changed the model database’s data and log file sizes to 100MB each. Also, it’s important to note Instance File Initialization is only for data files, log files are zeroed out due to requirements for crash recovery. We’re going to see that in action in a bit…

DBCC TRACEON(3004,3605,-1)

GO

CREATE DATABASE TestFileZero

GO

EXEC sp_readerrorlog

GO

DROP DATABASE TestFileZero

GO

DBCC TRACEOFF(3004,3605,-1)

Once the database creation is done, stop your strace and let’s go and check out the data gathered in the file.

Poking Around in Linux Internals, Creating the MDF

Inside your output file you’re going to see a collection of system calls. In Linux a system call is the way a user space program can ask the kernel to do some work. And in this case SQL Server on Linux is asking the kernel to create a data file, create a log file and zero out the log file. So let’s check out what happens when the MDF is created.

1630  09:03:28.393592 open(“/var/opt/mssql/data/TestFileZero.mdf”, O_RDWR|O_CREAT|O_EXCL|O_DIRECT, 0660) = 154

First thing, that 1630, that’s the process ID of the thread that’s doing the work here. That PID is different than the one we attached strace to because it’s a thread gets created when we execute our database create statements.
Next, you see a call to open, it’s opening the file TestFileZero.mdf. The next parameter are flags to tell open what to do, in this case O_RDWR opens the file for read/write access, O_CREAT creates a file, O_EXCL prevents open from overwriting the file if it exists, and O_DIRECT enables synchronous I/O to the file and disables the file system cache, 0660 is the file mode and the return value is 164…this is the file descriptor for the file created. A file descriptor (fd) is used to represent and provide access to the file that was just opened. We’ll pass this to other system calls so they can interact with the file addressed by the fd.
 

1630  09:03:29.087471 fallocate(154, 0, 0, 104857600 <unfinished …> = 0

1630  09:03:29.087579 <… fallocate resumed> ) = 0

 
Next, we see a call to fallocate on the file descriptor 154, the first 0 is the mode, which tells fallocate to allocate disk space within the range specified in the third and forth parameters, offset and length respectively. And here is from 0 to 100MB. If you read the man page for fallocate, there is a FALLOC_FL_ZERO_RANGE flag that can be passed into the mode parameter. In this call, mode is set to 0 so this flag is not set. Further, the man page indicates that this flag is supported on XFS in Linux kernel 3.15. I’m on 3.10 using CentOS 7.4. So there’s no zeroing magic happing at the file system level.
 

1630  09:03:29.087603 ftruncate(154, 104857600) = 0

 
Next, there’s a call to ftruncate on fd 154. This call sets the length of the file to the parameter passed in this case 100MB.
 

1630  09:03:29.091223 io_submit(140030075985920, 1, [{data=0x185b4dc48, pwrite, fildes=154, str=”\1\v\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0c\0\0\0\2\0\374\37″…, nbytes=8192, offset=8192}]) = 1

 
Next, we’ll see a sequence of calls using io_submit, which submits asynchronous I/O to disk. The parameters for this one are an aio_context_t, the number of blocks in the request, then an array of AIO control blocks. The AIO control blocks (struct iocb) are what’s inside the brackets [], the key parameters are pwrite and filedes. That’s a write operation and the file descriptor which matches the fd that we’ve been working with, 154. str is the actual data, then the number of bytes in this operation followed by the offset, which is the starting point of the IO.
There are 17 write operations, in my strace output, these are laying out the mdf file, which has a header and other metadata through the file, but it’s certainly not zeroing the file out. We’ll see how zeroing works when we get to the LDF. Here’s a link to the definition to the IO Control Block (struct iocb) and more on asynchronous I/O in Linux.

1630  09:03:29.098452 fsync(154 <unfinished …>

1630  09:03:29.098640 <… fsync resumed> ) = 0

 
The fsync call instructs the file descriptor to flush all buffers to disk. 

1630  09:03:29.099260 close(154)        = 0

 
The close call closes the file descriptor, releases any locks and allows the file to be reused by other processes.

Poking Around in Linux Internals, Creating and Zeroing the Transaction Log File

So that was the creation of the data file, now let’s check out how the transaction log file is created. Instant File Initialization only applies to data files, transaction log files must be zeroed out for crash recovery. Let’s dig in. 

1630  09:03:29.831413 open(“/var/opt/mssql/data/TestFileZero_log.ldf”, O_RDWR|O_CREAT|O_EXCL|O_DIRECT, 0660) = 154

We see an open call again, nearly identical, and the file descriptor returned is again 154. 
 

1630  09:03:30.395757 fallocate(154, 0, 0, 104857600) = 0

There’s a call to fallocate to provision the underlying storage.
 

1630  09:03:30.395814 ftruncate(154, 104857600) = 0

 
Then we see a call to truncate again to ensure the size of the file is 100MB.
 

1630  09:03:30.396466 fsync(154 <unfinished …>

1630  09:03:30.397672 <… fsync resumed> ) = 0

 
Then there’s a call to fsync again, flushing buffers to disk.

1630  09:03:30.400042 write(1, “Z”, 1)  = 1

1630  09:03:30.400088 write(1, “e”, 1)  = 1

1630  09:03:30.400134 write(1, “r”, 1)  = 1

1630  09:03:30.400180 write(1, “o”, 1)  = 1

1630  09:03:30.400246 write(1, “i”, 1)  = 1

1630  09:03:30.400301 write(1, “n”, 1)  = 1

1630  09:03:30.400348 write(1, “g”, 1)  = 1

…output omitted

 
Now things get special…we see a series of write calls. This write call isn’t writing to the file…it’s writing to standard out, as indicated by the first parameter which is 1. 1 is the file descriptor for standard out. The second parameter is the data to be written out, in this case you can see it’s a single character, the third parameter is the size of the data being written. The return value, that’s the last 1 on the line, that’s the number of bytes written to the file. And guess where this data is being sent too…the SQL Server Error log! See the string “Zeroing”? The SQL Server Error Log is written to file and to standard out. Very cool. The fact that it hits standard out and it’s systemd service unit means the SQL Server Error Log also does into journald. Very cool!
Zeroing transaction log start

1630  09:03:30.406250 io_submit(140030075985920, 1, [{data=0x185b62308, pwritev, fildes=154, iovec=[{“\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300″…, 4096}…], offset=0}]) = 1

…Output omitted

1630  09:03:30.454831 io_submit(140030075985920, 1, [{data=0x185b4dc48, pwritev, fildes=154, iovec=[{“\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300″…, 4096}…], offset=100663296}]) = 1

Now for the log file, we’re going to see io_submit calls again, this time MANY more, from the output here it shows the starting offset = 0 and an ending offset = 100663296, there’s many io_submit calls  in between. If I did the math correctly, each io_submit writes 4MB of data, this last io_submit call is starting offset is at 96MB, plus that IO then we have our zeroed 100MB log file.
The difference in the time stamps between the first call and the start of the last call 48.581ms. strace’s time stamps are in microseconds. 

1630  09:03:30.460172 write(1, “Z”, 1)  = 1

1630  09:03:30.460215 write(1, “e”, 1)  = 1

1630  09:03:30.460259 write(1, “r”, 1)  = 1

1630  09:03:30.460315 write(1, “o”, 1)  = 1

1630  09:03:30.460361 write(1, “i”, 1)  = 1

1630  09:03:30.460406 write(1, “n”, 1)  = 1

1630  09:03:30.460450 write(1, “g”, 1)  = 1

 
When the io_submit calls are finished, we see a series of writes to standard out and the same data is in the SQL Error log and we see that zeroing is finished and it took 60ms…very close to what was reported by strace’s time stamps! There are additional calls from the printing of the start zeroing message to the stop zeroing message that are not included in my output here, plus the time that last IO takes to complete. That’s why there’s a variance in strace’s reported time and SQL Error Log’s reported time.
Zeroring transaction log finish
 

1630  09:03:30.469088 fsync(154 <unfinished …>

1630  09:03:30.469532 <… fsync resumed> ) = 0

 
We call fsync to flush our buffers for this file. 
 

1630  09:03:30.469977 close(154)        = 0

Then we close the file.
In your strace data, you’ll see another sequence of zeroing in the file for zeroing the tail of the log, here’s the output from the SQL Error Log. The process is similar with the exception of the data. There are some non-zero values being written at the end of the file for crash recovery reasons.

Zeroring tail of the log

And there we have it, this is how Instant File Initialization works on Linux and some Linux kernel internals for you SQL Pros!