Category Archives: Linux

Attempting to Run SQL on Linux Inside Windows Subsystem for Linux

Shawn Melton MVP and dbatools contributor last week had an issue running SQL Server on Linux inside of Windows Subsystem for Linux.

I didn’t want to leave a brother hanging so I spent this morning digging into this a little bit. 

Reproducing the Issue

The first thing I had to do was reproduce the issue. So on my Windows 10 test VM I installed the Windows Subsystem for Linux, steps to do so are here and I installed the Ubuntu app.

Then, I fired up a bash shell using WSL and then I installed SQL Server on Linux for Ubuntu as documented here

Now, I completed the installation of SQL Server on Linux using mssql-conf when that program completes it attempts to start SQL Server on Linux. BOOM! I’m able to reproduce the same error.

Looking at the error, I decided to see if I could run SQL Server on Linux from the shell as the user mssql. This would remove systemd and mssql-conf from the picture. Basically I wanted to see if I could get another, more descriptive, error to pop out. To do that we’ll need to change over to the mssql user with su.

sudo su mssql -

And then change into the working directory for SQL Server on Linux and try to launch SQL Server.

cd /var/opt/mssql/
/opt/mssql/bin/sqlservr

Now, doing that…generates same same error! Here’s the error in a search engine friendly form :)

mssql@DESKTOP:~$ /opt/mssql/bin/sqlservr
This program has encountered a fatal error and cannot continue running.
The following diagnostic information is available:
Reason: 0x00000003
Message: fd != -1
Stacktrace: 00007f818942d4d3 00007f8188de76ba 00007f81863e73dd
Process: 79 - sqlservr
Thread: 80
Instance Id: 50bd6e1b-8f6c-45b3-939d-2338725d8b4a
Crash Id: d38007c0-48c6-4374-9205-5539333138ff
Build stamp: 5fb3474a5f63ad2f4b7eddadad44a086839721f18a66c5fb5d7cfcce25c0f539
This program has encountered a fatal error and cannot continue running.
The following diagnostic information is available:
Reason: 0x00000003
Message: fd != -1
Stacktrace: 00007f818942d6ac 00007f8188de76ba 00007f81863e73dd
Process: 81 - sqlservr
Thread: 83
Instance Id: 50bd6e1b-8f6c-45b3-939d-2338725d8b4a
Crash Id: d38007c0-48c6-4374-9205-5539333138ff
Build stamp: 5fb3474a5f63ad2f4b7eddadad44a086839721f18a66c5fb5d7cfcce25c0f539
*********** PANIC CORE DUMP GENERATION FAILED **********
Attempt to launch handle-crash.sh failed with error 0x0000000C
Aborted (core dumped)

Digging a Little Deeper

So now with the same error output, I decided to give it a cursory pass with strace to see if I could find anything that would put us closer to why SQL Server on Linux won’t start when using Windows Subsystem for Linux.

What you see in the strace output is the parent process creating the child sqlservr process and failing. In the first line of output you can see process 137 clone and return process ID 139. Which is how a parent process creates a child in Linux. Then process 139 tries to perform some setup operations like registering signal actions (rt_sigaction) and their corresponding routines to call when that signal is received by that process.

Now the only error I found in the output is the prctl call which returns invalid argument.This system call is to perform operations on a process.  On my WSL system the option being set PR_SET_PTRACER is for the Yama LSM subsystem which lives in /proc/sys/kernel/yama normally. This doesn’t exist on my Ubuntu WSL installation. I checked my CentOS full VMs and this exists. I checked a full Ubuntu installation and it’s there too.

After the error SQL Server calls tgkill and kills itself with the SIGABRT signal. A dump occurs and the program exits. 

137 clone(child_stack=0x7fb1a0feff30, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7fb1a0ff09d0, tls=0x7fb1a0ff0700, child_tidptr=0x
7fb1a0ff09d0) = 139
139 set_robust_list(0x7fb1a0ff09e0, 24 <unfinished ...>
137 fcntl(1, F_SETFL, O_RDONLY|O_APPEND <unfinished ...>
139 <... set_robust_list resumed> ) = 0
137 <... fcntl resumed> ) = 0
139 gettid( <unfinished ...>
137 fcntl(2, F_SETFL, O_RDONLY|O_APPEND <unfinished ...>
139 <... gettid resumed> ) = 139
137 <... fcntl resumed> ) = 0
139 rt_sigaction(SIGABRT, {0x7fb1a6a2e470, [ABRT], SA_RESTORER|SA_RESTART, 0x7fb1a39154b0}, <unfinished ...>
137 getrlimit(RLIMIT_NOFILE, <unfinished ...>
139 <... rt_sigaction resumed> {0x7fb1a6a2d290, [ABRT], SA_RESTORER|SA_RESTART, 0x7fb1a39154b0}, 8) = 0
137 <... getrlimit resumed> {rlim_cur=1024, rlim_max=4*1024}) = 0
139 rt_sigaction(SIGILL, {0x7fb1a6a2e470, [ILL], SA_RESTORER|SA_RESTART, 0x7fb1a39154b0}, <unfinished ...>
137 setrlimit(RLIMIT_NOFILE, {rlim_cur=4*1024, rlim_max=4*1024} <unfinished ...>
139 <... rt_sigaction resumed> {0x7fb1a6a52790, [], SA_RESTORER|SA_STACK|SA_NODEFER|SA_SIGINFO, 0x7fb1a63f1390}, 8) = 0
137 <... setrlimit resumed> ) = 0
139 rt_sigaction(SIGFPE, {0x7fb1a6a2e470, [FPE], SA_RESTORER|SA_RESTART, 0x7fb1a39154b0}, <unfinished ...>
137 gettid( <unfinished ...>
139 <... rt_sigaction resumed> {0x7fb1a6a52790, [], SA_RESTORER|SA_STACK|SA_NODEFER|SA_SIGINFO, 0x7fb1a63f1390}, 8) = 0
137 <... gettid resumed> ) = 137
139 rt_sigaction(SIGSEGV, {0x7fb1a6a2e470, [SEGV], SA_RESTORER|SA_RESTART, 0x7fb1a39154b0}, <unfinished ...>
137 rt_sigprocmask(SIG_BLOCK, [TERM], <unfinished ...>
139 <... rt_sigaction resumed> {0x7fb1a6a52790, [], SA_RESTORER|SA_STACK|SA_NODEFER|SA_SIGINFO, 0x7fb1a63f1390}, 8) = 0
137 <... rt_sigprocmask resumed> NULL, 8) = 0
139 rt_sigaction(SIGBUS, {0x7fb1a6a2e470, [BUS], SA_RESTORER|SA_RESTART, 0x7fb1a39154b0}, <unfinished ...>
137 rt_sigtimedwait([TERM], NULL, NULL, 8 <unfinished ...>
139 <... rt_sigaction resumed> {0x7fb1a6a52790, [], SA_RESTORER|SA_STACK|SA_NODEFER|SA_SIGINFO, 0x7fb1a63f1390}, 8) = 0
139 rt_sigaction(SIGTRAP, {0x7fb1a6a2e470, [TRAP], SA_RESTORER|SA_RESTART, 0x7fb1a39154b0}, {0x7fb1a6a52790, [], SA_RESTORER|SA_STACK|SA_NODEFER|SA_SIGINFO, 0x7fb1a63f1390}, 8) = 0
139 rt_sigaction(SIGSYS, {0x7fb1a6a2e470, [SYS], SA_RESTORER|SA_RESTART, 0x7fb1a39154b0}, {0x7fb1a6a2d290, [SYS], SA_RESTORER|SA_RESTART, 0x7fb1a39154b0}, 8) = 0
139 rt_sigaction(SIGXCPU, {0x7fb1a6a2e470, [XCPU], SA_RESTORER|SA_RESTART, 0x7fb1a39154b0}, {0x7fb1a6a2d290, [XCPU], SA_RESTORER|SA_RESTART, 0x7fb1a39154b0}, 8) = 0
139 rt_sigaction(SIGXFSZ, {0x7fb1a6a2e470, [XFSZ], SA_RESTORER|SA_RESTART, 0x7fb1a39154b0}, {0x7fb1a6a2d290, [XFSZ], SA_RESTORER|SA_RESTART, 0x7fb1a39154b0}, 8) = 0
139 rt_sigaction(SIGSTKFLT, {0x7fb1a6a2e470, [STKFLT], SA_RESTORER|SA_RESTART, 0x7fb1a39154b0}, {0x7fb1a6a2d290, [STKFLT], SA_RESTORER|SA_RESTART, 0x7fb1a39154b0}, 8) = 0
139 mmap(NULL, 4194304, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb1a03f0000
139 munmap(0x7fb1a03f0000, 4194304) = 0
139 mmap(NULL, 8384512, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb19fff0000
139 munmap(0x7fb19fff0000, 65536) = 0
139 munmap(0x7fb1a0400000, 4124672) = 0
139 open("/proc/self/status", O_RDONLY) = 41
139 fstat(41, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
139 read(41, "Name:\tsqlservr\nState:\tS (sleepin"..., 4096) = 663
139 close(41) = 0
139 rt_sigaction(SIGABRT, {0x7fb1a6a2d290, [ABRT], SA_RESTORER|SA_RESTART, 0x7fb1a39154b0}, {0x7fb1a6a2e470, [ABRT], SA_RESTORER|SA_RESTART, 0x7fb1a39154b0}, 8) = 0
139 rt_sigaction(SIGILL, {0x7fb1a6a2d290, [ILL], SA_RESTORER|SA_RESTART, 0x7fb1a39154b0}, {0x7fb1a6a2e470, [ILL], SA_RESTORER|SA_RESTART, 0x7fb1a39154b0}, 8) = 0
139 rt_sigaction(SIGFPE, {0x7fb1a6a2d290, [FPE], SA_RESTORER|SA_RESTART, 0x7fb1a39154b0}, {0x7fb1a6a2e470, [FPE], SA_RESTORER|SA_RESTART, 0x7fb1a39154b0}, 8) = 0
139 rt_sigaction(SIGSEGV, {0x7fb1a6a2d290, [SEGV], SA_RESTORER|SA_RESTART, 0x7fb1a39154b0}, {0x7fb1a6a2e470, [SEGV], SA_RESTORER|SA_RESTART, 0x7fb1a39154b0}, 8) = 0
139 rt_sigaction(SIGBUS, {0x7fb1a6a2d290, [BUS], SA_RESTORER|SA_RESTART, 0x7fb1a39154b0}, {0x7fb1a6a2e470, [BUS], SA_RESTORER|SA_RESTART, 0x7fb1a39154b0}, 8) = 0
139 rt_sigaction(SIGTRAP, {0x7fb1a6a2d290, [TRAP], SA_RESTORER|SA_RESTART, 0x7fb1a39154b0}, {0x7fb1a6a2e470, [TRAP], SA_RESTORER|SA_RESTART, 0x7fb1a39154b0}, 8) = 0
139 rt_sigaction(SIGSYS, {0x7fb1a6a2d290, [SYS], SA_RESTORER|SA_RESTART, 0x7fb1a39154b0}, {0x7fb1a6a2e470, [SYS], SA_RESTORER|SA_RESTART, 0x7fb1a39154b0}, 8) = 0
139 rt_sigaction(SIGXCPU, {0x7fb1a6a2d290, [XCPU], SA_RESTORER|SA_RESTART, 0x7fb1a39154b0}, {0x7fb1a6a2e470, [XCPU], SA_RESTORER|SA_RESTART, 0x7fb1a39154b0}, 8) = 0
139 rt_sigaction(SIGXFSZ, {0x7fb1a6a2d290, [XFSZ], SA_RESTORER|SA_RESTART, 0x7fb1a39154b0}, {0x7fb1a6a2e470, [XFSZ], SA_RESTORER|SA_RESTART, 0x7fb1a39154b0}, 8) = 0
139 rt_sigaction(SIGSTKFLT, {0x7fb1a6a2d290, [STKFLT], SA_RESTORER|SA_RESTART, 0x7fb1a39154b0}, {0x7fb1a6a2e470, [STKFLT], SA_RESTORER|SA_RESTART, 0x7fb1a39154b0}, 8) = 0
139 prctl(PR_SET_PTRACER, PR_SET_PTRACER_ANY) = -1 EINVAL (Invalid argument)
139 prctl(PR_SET_PDEATHSIG, SIG_0) = 0
139 open("/proc/self/status", O_RDONLY) = 41
139 fstat(41, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
139 read(41, "Name:\tsqlservr\nState:\tS (sleepin"..., 4096) = 663
139 close(41) = 0
139 rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
139 tgkill(137, 139, SIGABRT) = 0
139 --- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=137, si_uid=999} ---
139 gettid() = 139
139 write(2, "Dump collecting thread [139] hit"..., 57) = 57
139 exit_group(-1) = ?
137 +++ exited with 255 +++
138 +++ exited with 255 +++
139 +++ exited with 255 +++

What’s Really Happening?

Well I think something is missing from Windows Subsystem for Linux. Is it the Yama stuff…perhaps. But clearly SQL Server isn’t happy with the environment and kills itself. I haven’t dove into WSL yet and I don’t know how it’s implemented, but there could also be something up at that level too. Generally I don’t write blog posts where I don’t know exactly what’s going on, but I did want to let folks know that SQL on Linux doesn’t work on Windows Subsystem for Linux. 


A Novel Idea for High Availability in SQL Server on Linux

Over the past year we’ve learned about how SQL Server on Linux is implemented, leveraging SQLPAL and the team is pretty confident in their architectural decisions as indicated in this post here.

Now that there is this wrapper around SQL Server, this really opens up some interesting opportunities…perhaps we can leverage SQLPAL to facilitate some new high availability techniques.

When I was in graduate school, I worked on a research project, that became my master’s thesis. In this work, I developed a technique that synchronized the process address space of a virtual machine on two separate physical hypervisors.The technique involved an initial copy of all pages between the two systems and then selectively copying the virtual machine’s pages as they became dirty. Using this technique, the process address space of the virtual machine is synchronized between the two hypervisors. This allows for a significant reduction in the amount of information that had to be replicated between the hypervisors but more importantly…the virtual machines memory in sync which meant if hypervisor hosting the virtual machine crashed we could theoretically start the virtual machine on the second hypervisor.

Now, during my PASS Summit talk this year, I presented to the audience my theory that SQLPAL is virtualization. But it’s not machine virtualization, it’s process virtualization. Which means there’s a purpose built environment hosting the SQL Server process. This environment, SQLPAL, is the main allocator of resources from the physical system. It’s the thing that asks for memory, disk, network anything that’s needed from the underlying operating system.

Now, what if we took these two ideas and brought them together? What if SQLPAL was able to synchronize the program state and resources between two separate systems? Could we provide highly available SQL Services with a technique like this? I think we can. Perhaps we don’t even synchronize the pages between the system. Perhaps an even lighter technique could be used, such as duplicating the system calls between the two copies of SQL Server and thus implicitly synchronizing the program state.

Think about the possibilities…we could have a system that fails over with all the context of the currently active system, active connections could stay active, buffer pool populated, plan cache could still exist and not have to be rebuilt. Yes, we’ll likely need some sort of low latency, high bandwidth interconnect..but we have those. And there’s certainly more implementation details that need to be thought through…but I think there’s something here. 

A couple questions I thought of while writing this…

1. Does this provide more value than Availability Groups? I think so…program state remains in sync between the two systems. So things like user connections could be maintained during failover (with the appropriate relocation of the IP of course). I also think the quorum model would be simpler, as there is only one pair in the synchronization.

2. Does this provide more value than virtual machine migration, perhaps. This technique could be hypervisor independent.

I’d love to hear your thoughts on this! Most of all I want you to start thinking about new ways we can leverage SQLPAL and it’s abstraction from hardware.

I’m Presenting Two Linux Sessions on One Day!

On 12/13 I’m presenting two, back to back, sessions on SQL Server on Linux online. So you can attend from anywhere!

Let’s go through both!

First, on 12/13 at 1PM Central, I’m presenting for the PASS Database Administration Virtual Group here’s my session details:

Topic: Linux OS Fundamentals for the SQL Admin

Registration: You must register if you want to attend. You can register at http://dba.pass.org/. When you register, you will receive a link to the meeting. All attendees will be entered into a raffle for a $25 gift card.

Abstract: SQL Server and PowerShell are now available on Linux and management wants you to leverage this shift in technology to more effectively manage your systems, but you’re a Windows admin! Don’t fear! It’s just an operating system! It has all the same components Windows has and in this session, we’ll show you that. We will look at the Linux operating system architecture and show you how to interact with and manage a Linux system. By the end of this session you’ll be ready to go back to the office and get started working with Linux with a fundamental understanding of how it works.

 

Second, on 12/13 at 2PM Central, I’m presenting at the PASS Marathon: Linux Edition here’s my session details:

Topic: Monitoring Linux Performance for the SQL Server Admin

Registration: You must register if you want to attend. You can register at http://www.pass.org/marathon/2017/december/Registration.aspx.

Abstract: So you’re a SQL Server administrator and you just installed SQL Server on Linux. It’s a whole new world. Don’t fear, it’s just an operating system. It has all the same components Windows has and in this session we’ll show you that. We will look at the Linux operating system architecture and show you where to look for the performance data you’re used to! Further we’ll dive into SQLPAL and how it architecture and internals enables high performance for your SQL Server. By the end of this session you’ll be ready to go back to the office and have a solid understanding of performance monitoring Linux systems and SQL on Linux. We’ll look at the core system components of CPU, Disk, and Memory monitoring techniques for each and look some of the new tools available including new DMVs and DBFS.
Hope to see you online!

NewImage

Top 5 Reasons Why I think SQL Server on Linux is Legit

Here are my top 5 reasons why I thing SQL Server on Linux is Legit!

  1. SQL Server on Linux is Fast – Earlier this year SQL Server on Linux posted the fasted 1TB TPC-H benchmark in the world and at the end of October posted the a 10TB result! Check out the results here and some info on how they did it here and here.
  2. It’s tunable – From and OS standpoint, I think the “tunability” of the operating system is more well documented and well known on Linux. Check out Microsoft’s recommendations here and also Redhat’s here.
  3. Features – If you’re a developer in the Linux ecosphere, this is the reason why you’re evaluating using SQL Server on Linux…there’s likely a feature you want…that you can now have. Check them out here!
  4. Enterprise Support – I like cruising around in forums just like anybody else, but sometimes you have to call support to bring in the people that actually wrote the software.
  5. Availability Solutions – SQL Server has a proven track record for availability, those same concepts and techniques apply to SQL Server on Linux. Backups, Availability Groups, and Failover Clusters check it out.

Launching SQL Server on Linux in Single User Mode

There was a question this morning on the SQL Server Community Slack channel from SvenLowry about how to launch SQL Server on Linux in Single User Mode. Well you’ve heard everyone say, it’s just SQL Server…and that’s certainly true and this is another example of that idea.

The command line parameters from the sqlservr binary are passed through into the SQLPAL managed Win32 SQL Process. So let’s check out how to do this together…

First, you’ll want to switch to the user mssql and you can do that with this command

bash-4.2$ sudo su mssql –


What’s happening here is we’re using sudo to switch our user’s security context to the user mssql. This is the account that SQL Server normally runs under. That last dash there is to load the mssql user’s shell, rather than ours.
 
Next, we need to launch the /opt/mssql/bin/sqlservr binary with the -m parameter
 

bash-4.2$ /opt/mssql/bin/sqlservr -m 


Here’s the output from the console while SQL Server is starting up. 

2017-11-09 12:53:18.70 Server      Microsoft SQL Server 2017 (RTM-CU1) (KB4038634) – 14.0.3006.16 (X64) 

Oct 19 2017 02:42:29 

Copyright (C) 2017 Microsoft Corporation

Developer Edition (64-bit) on Linux (CentOS Linux 7 (Core))

2017-11-09 12:53:18.70 Server      UTC adjustment: -6:00

2017-11-09 12:53:18.70 Server      (c) Microsoft Corporation.

2017-11-09 12:53:18.70 Server      All rights reserved.

2017-11-09 12:53:18.70 Server      Server process ID is 4120.

2017-11-09 12:53:18.70 Server      Logging SQL Server messages in file ‘/var/opt/mssql/log/errorlog’.

2017-11-09 12:53:18.70 Server      Registry startup parameters: 

-d /var/opt/mssql/data/master.mdf

-l /var/opt/mssql/data/mastlog.ldf

-e /var/opt/mssql/log/errorlog

2017-11-09 12:53:18.70 Server      Command Line Startup Parameters:

-m

Output omitted…

2017-11-09 12:53:19.50 spid4s      SQL Server started in single-user mode. This an informational message only. No user action is required.

Output omitted…

2017-11-09 12:53:19.90 spid4s      Always On Availability Groups was not started because the SQL Server instance is running in single-user mode.  This is an informational message.  No user action is required.

Output omitted…

2017-11-09 12:53:20.62 spid4s      Recovery is complete. This is an informational message only. No user action is required.


From here, you can go about what ever task it is you needed single user mode for.
 

New Pluralsight Course – SQL Server on Linux Administration Fundamentals

New Pluralsight Course – SQL Server on Linux Administration Fundamentals

My new course “SQL Server on Linux Administration Fundamentals” in now available on Pluralsight here! If you want to learn about the course, check out the trailer here or if you want to dive right in check it out here!
 
This course targets DBAs that design and maintain SQL Server on Linux systems (or those evaluating the technology). This course can be used by both the seasoned DBA to learn foundational Linux skills and also what’s new and different when running SQL Server on Linux. 

Course Description

SQL Server is available on Linux, and management wants you to leverage this shift in technology to more effectively manage your data platform. In this course, SQL Server on Linux Administration Fundamentals, you’ll delve into SQL on Linux in order for you to become an effective DBA.  First, you’ll explore an overview of its architecture, installation, and configuration. Next, you’ll learn how to administer SQL Server on Linux. Finally, you’ll discover high availability and disaster recovery options available to you for keeping your SQL Server online. By the end of this course, you’ll have a solid foundation necessary to utilize SQL Server on Linux in production.

The modules of the course are:

  • Introduction and SQL Server Architecture – Introduce the viewer into world of SQL Server on Linux. Why did Microsoft do this? What’s the strategy? Introduce the SQL Server Ecosphere, such as the database engine, SQL Server Agent and SSIS.
  • Installing and Configuring SQL Server on Linux – We’ll look at our installation and configuration options for SQL Server on Linux, introducing Linux package managers and repositories and install SQL Server on Linux and it’s components.
  • Administering Linux for DBAs – We’ll look at managing services with systemd and how to query journald’s log files for information about SQL Server.  Also dive into file ownership, disk partitioning concepts and mounting file systems and remote file systems.
  • Managing SQL Server on Linux: Administration and Tools – Now that the viewer knows where things are in this new operating system, let’s move up the stack and look at the tooling available for SQL Server on Linux. We’ll cover VS Code, SSMS, SQLCMD and DBFS.
  • High Availability and Disaster Recovery with SQL Server on Linux – Dive into the High Availability and Disaster Recovery options available to SQL Server on Linux
     

Pluralsight Redhat Linux

Check out the course at Pluralsight!

Exit Codes, systemd and SQL Server on Linux

In this blog post we’re going to cover systemd, process exit codes and highlight how systemd reacts in certain exit conditions from SQL Server on Linux. My friend and SQL Server guru Argenis Fernandez – @dbargenis asked about this behavior on Twitter and I’ve been meaning to write this post, so here you go! Also, there’s a Connect item filed by Argenis on this here. Vote!

systemd Basics

Systemd is an initialization daemon, it’s job is to bring the system to usable state. Meaning, it’s responsible for the orderly starting of services on a Linux system. It does much more than that, in fact, one of it’s other core components is journald. Journald stores logging information from systemd units. 

Now with respect to SQL Server on Linux, the information logged into journald is systemd’s information about our mssql.service unit and also the SQL Server Error Log. The SQL Server Error Log lands in here because it’s written to standard out and by default, a service unit’s standard out is written into the journal. 

We can query the information stored in journald with the command journalctl and below is the syntax to query a particular service unit’s log in journald. Running just journalctl will cause the text output to run off the side of the screen without wrapping the text in your terminal. To get the text to wrap, use the following code…then pipe the output into a pager like more or less because you know…less is more. We’ll use this command in the next section.

journalctl -u mssql-server –no-pager | more

Service units in systemd are configured in unit files and SQL Server on Linux’s unit file lives in the file /usr/lib/systemd/system/mssql-server.service. Inside that file we have the following configuration:

# Restart on non-successful exits.

Restart=on-failure

What Restart=on-failure option does, is if a systemd unit returns an unclean exit code, systemd will automatically restart the service. There are other conditions that cause it to restart such as responding to unclean signals, watchdog and also service timeouts. We’ll save those discussions for another day.

Understanding Process Exit Codes

When a process exits, it will return an integer value to the parent process. If a process terminates cleanly, it conventionally returns 0 to the parent process. This means all is well in the world and the process has shut down. In a relational database system this is significant, in that, we really do want our systems to shut down cleanly. Now when things go wrong, that’s when process return a non-zero value.  systemd is the parent process of the sqlservr process and receives it’s exit code. Let’s see what happens in certain exit conditions from SQL Server on Linux.

Let’s say we initiate a shutdown in SQL Server the T-SQL command SHUTDOWN here’s what we’ll get in journald. 

Oct 28 05:27:58 sqlb.lab.centinosystems.com sqlservr[1103]: 2017-10-28 05:27:58.37 spid51      Server shut down by request from login sa.

Oct 28 05:27:58 sqlb.lab.centinosystems.com sqlservr[1103]: 2017-10-28 05:27:58.37 spid51      SQL Trace was stopped due to server shutdown. Trace ID = ‘1’. This is an informational message only; no user action is required.

Oct 28 05:28:00 sqlb.lab.centinosystems.com systemd[1]: mssql-server.service: main process exited, code=exited, status=1/FAILURE

Oct 28 05:28:00 sqlb.lab.centinosystems.com systemd[1]: Unit mssql-server.service entered failed state.

Oct 28 05:28:00 sqlb.lab.centinosystems.com systemd[1]: mssql-server.service failed.

Oct 28 05:28:00 sqlb.lab.centinosystems.com systemd[1]: mssql-server.service holdoff time over, scheduling restart.

Oct 28 05:28:00 sqlb.lab.centinosystems.com systemd[1]: Started Microsoft SQL Server Database Engine.

Oct 28 05:28:00 sqlb.lab.centinosystems.com systemd[1]: Starting Microsoft SQL Server Database Engine…

Oct 28 05:28:02 sqlb.lab.centinosystems.com sqlservr[1822]: 2017-10-28 05:28:02.87 Server      Microsoft SQL Server 2017 (RTM-CU1) (KB4038634) – 14.0.3006.16 (X64)

 
Now in the output above, you’ll notice a bolded line. In there, you can system that systemd[1] receives a return code from SQL Server of status=1/FAILURE.  Systemd[1] is the parent process to sqlservr, in fact it’s the parent to all processes on our system. It receives the exit code and immediately, systemd initiates a restart of the service due to the configuration we have for our mssql-server systemd unit.
 
What’s interesting is that this happens even on a normal shutdown. But that simply doesn’t make sense, return values on clean exits should return 0. It’s my understanding of the SHUTDOWN command, that it will cause the database engine to shutdown cleanly. 
 
Now let’s say we initiate a SHUTDOWN WITH NOWAIT
 

Oct 28 05:31:24 sqlb.lab.centinosystems.com sqlservr[1822]: 2017-10-28 05:31:24.22 spid51      Server shut down by NOWAIT request from login sa.

Oct 28 05:31:24 sqlb.lab.centinosystems.com sqlservr[1822]: 2017-10-28 05:31:24.22 spid51      SQL Trace was stopped due to server shutdown. Trace ID = ‘1’. This is an informational message only; no user action is required.

Oct 28 05:31:25 sqlb.lab.centinosystems.com systemd[1]: mssql-server.service: main process exited, code=exited, status=1/FAILURE

Oct 28 05:31:25 sqlb.lab.centinosystems.com systemd[1]: Unit mssql-server.service entered failed state.

Oct 28 05:31:25 sqlb.lab.centinosystems.com systemd[1]: mssql-server.service failed.

Oct 28 05:31:25 sqlb.lab.centinosystems.com systemd[1]: mssql-server.service holdoff time over, scheduling restart.

Oct 28 05:31:25 sqlb.lab.centinosystems.com systemd[1]: Started Microsoft SQL Server Database Engine.

Oct 28 05:31:25 sqlb.lab.centinosystems.com systemd[1]: Starting Microsoft SQL Server Database Engine…

Oct 28 05:31:27 sqlb.lab.centinosystems.com sqlservr[2035]: 2017-10-28 05:31:27.64 Server      Microsoft SQL Server 2017 (RTM-CU1) (KB4038634) – 14.0.3006.16 (X64)


In this output, we see the same behavior, SQL Server returns a status=1/FAILURE and restarts. This makes sense to me, an immediate shutdown should return a non-zero value and it does. 
 
In this post we covered, systemd units and their restart configurations. We looked at exit codes and how they’re used to communicate back to the parent process about the child’s exit status. While it’s just SQL Server…as DBAs we still need to learn about this new operating environment and how it works! Keep learning!
 
For more details on this behavior and it’s configuration I encourage you to read the documentation at this link here. Further, if you’d like a deep dive into systemd check out my Pluralsight course LFCE: Advanced Network and System Administration In this course I cover systemd in great detail, it’s architecture, units and target. 

SQL Server on Linux Content at PASS Summit

PASS Summit is right around the corner and I’ll be there speaking on Monitoring Linux Performance for the SQL Server Admin!

There’s a fantastic amount of SQL Server on Linux content available at Summit. I encourage you to attend one of these sessions. You’ll likely find me at all of these!

Wednesday – 11/1

Thursday – 11/2

Friday – 11/3

 

 

NewImageSummit2017 275x50

SQL Server on Linux – External Memory Pressure

In this blog post we’re going to explore how SQL Server on Linux responds to external memory pressure. On Windows based SQL Server systems we’ve become accustomed to the OS signaling to SQL Server that there’s a memory shortage. When signaled, SQL Server will kindly start shrinking it’s memory caches, including the buffer pool, to maintain overall system stability and usability. We’ll that story is a little different in SQL Server on Linux…let’s look and see how SQL Server on Linux responds to external memory pressure

The Scenario

Our SQL Server on Linux system is a CentOS 7.4 VM, running SQL Server 2017 (RTM). There is 12GB of physical memory and 80% of that is dedicated to SQL Server via mssql-conf about 9.2GB, the default configuration. I’ve also set Max Server memory to 8GB.

So let’s start up SQL Server on Linux and query the buffer pool cache sizes when there’s no data loaded. I’m going to look in two places. sys.dm_os_memory_clerks and also sys.dm_os_buffer_descriptors.  In Figures 1 and 2, we can get a feel for the memory distribution across the system and also inside the buffer pool.

Screen Shot 2017 10 19 at 2 01 04 PM

Figure 1 – Top memory consumers at system startup

Buffer Pool - Cold

 Figure 2 – Buffer Pool memory distribution at system startup

Now, I’m going to load a 8GB table into the buffer pool, maxing out SQL Server’s MAX memory instance setting. Now we’ll look at the allocations again. In Figures 3 and 4, we can see the buffer pool is the dominant consumer of memory and that our database is the dominant consumer in the buffer pool. So far so good, eh?

Memory Clerks - Warm

 Figure 3 – Top memory consumers after server reached maximum memory

Buffer Pool - Warm

 Figure 4 – Buffer Pool memory distribution after table loaded.

Getting Memory Information From Linux

We can use tools like ps, top and htop to look our are virtual and physical memory allocations. We can also look in the /proc virtual file system for our process and look at the status file. In here we’ll find the point in time status of a process, and most importantly the types of memory allocations for a process. We’ll get granular data on the virtual memory allocations and also the resident set size of the process. Here are the interesting values in the status file we’re going to focus on today.

  • VmSize – total current virtual address space of the process
  • VmRSS – total amount of physical memory currently allocated to the process
  • VmSwap – total amount of virtual memory currently paged out to the swap file (disk)

We can use the command pidof sqlservr to find the process ID for our SQL on Linux process, in our case it’s 14689. When running SQL Server on Linux there is always two processes for sqlservr, we’re going to look the one with the higher process ID.

Now to access the status information for our process we look in /proc/14689/status

Below is the output from our process’ status file, filtering for the values we’re interested in. You can see about 10.16GB of Virtual Memory (VmSize), 7.82GB of which is in actual RAM (VmRSS) and 622MB in the swap file (VmSwap). Nothing special to see here, just the baseline information for our process with a populated buffer pool.

VmSize: 10661380 kB

VmRSS:   8201240 kB

VmSwap:   636952 kB

Let’s Add Some Pressure

Using a small C program I wrote, I use calloc to allocate large sections of memory and then I have the program continually write data into the memory buffer to ensure those pages stay in physical memory. Using this program, let’s allocate 4GB of RAM. I’m choosing this value because it’s going to cause my system to exceed it’s physical memory but NOT exceed the amount virtual memory in my system when including the swap file’s size. Things can get really dicey if you exceed physical and virtual memory size, processes will start to get killed. Microsoft documents a case here.

Let’s look at the memory data from /proc/17141/status for my memory allocator program. In the output below you can see we have just about 4GB of virtual memory (VmSize) allocated and nearly the same amount of memory that’s resident in physical memory (VmRSS) and no data in the swap file.

VmSize:  4198476 kB

VmRSS:   4194484 kB

VmSwap:        0 kB

 

Now that 4GB program has caused Linux to need to make room in physical memory for this process. It does this by swapping least recently used pages from memory out to disk. So under memory pressure, let’s look at the SQL Server process’ memory allocations according to Linux. In the output below we see we still have a VmSize of around 10GB, but our VmRSS value has decreased dramatically. In fact, our VmRSS is now only 3.53GB. It decreased by 4.29GB!  VmSwap has increased to 5.04GB. Wow, that’s a huge portion of the process swapped to disk. 

VmSize: 10843320 kB

VmRSS:   3705476 kB

VmSwap:  5289752 kB

What Does SQL Server Think About All of This?

With our memory allocation program running and no workload running inside SQL Server, let’s look at sys.dm_os_memory_clerks and sys.dm_os_buffer_descriptors again. Something interesting happens, in Figures 5 and 6, when we look at sys.dm_os_memory_clerks and sys.dm_os_buffer_descriptors things changed but only slightly  But from the data above we do know that only 3.54GB of the process is actually in memory (VmRSS) at this point. Which means a large portion of SQL Server’s process is actually NOT in memory anymore…they’re on disk. The reality is, we simply do not know which portions of the process are in memory or on disk at this level.  You could use tools like pmap to determine which sections of memory are on disk, but we’ll leave that for another day. Try it out…it’s fun ;)

Swapped out? Why not?

Figure 5 – Top memory consumers after server reached maximum memory and with a significant portion of the process swapped to disk. Just about the same as it was without external memory pressure.

Buffer Pool - Warm but paged

Figure 6 – Buffer Pool memory distribution after table loaded and with a significant portion of the process swapped to disk. Just about the same as it was without external memory pressure.

Things aren’t what they seem

Looking at wait stats during query execution is going to be a little more interesting when SQL Server thinks it’s data pages are in memory, but a significant portion of them are actually on disk. In fact, when querying my test table with a simple SELECT COUNT(*) the wait types don’t indicate the fact that SQL Server is now actually disk bound due to the large portions of the buffer pool being paged out to disk. Under normal conditions, this query takes 11 seconds to run…when we’re paging to disk, it takes 2 minutes and 11 seconds to run. 

So let’s check out the wait types reported during our query output here in Figure 7. We see SOS_SCHEDULER_YIELD…which makes sense…long running query, that’s pretty slow and it’s quantum is being exhausted and the thread is being rescheduled. We do see some disk IO waits with PAGEIOLATCH_SH, but not much…what’s happening here is my data table is 8GB and my Max server memory is also 8GB so there’s a little overflow there and will read from disk for some of the data. From a performance standpoint, SQL Server has no idea that it’s data pages are likely swapped out to in this low memory situation. It thinks those pages are in the buffer pool…they certainly are…but a large chunk of the buffer pool is now actually swapped out to disk. 

Wait stats, who needs wait stats?

Figure 7 – Wait types reported during query execution with a significant portion of the process swapped to disk. Wait stats query from @brento 

What does this all mean?

Well, SQL Server is on a new operating system now, Linux. It’s upon us as DBAs to know how our system works as a whole and what to do when we’re in situations where performance falls off a cliff due to memory contention. This case is certainly a contrived case by putting so much external pressure on our process…but you, the DBA, need to know how SQL Server on Linux is going to react in these conditions and how to get your system back on track when things go south. Good monitoring and also good system resource management are needed. Certainly a properly size system in terms of physical memory and swap file size is needed. We toed the line here in our example ensuring we didn’t exceed physical memory plus swap file size. If you do exceed physical and swap size memory allocations will fail and your process will likely crash or be killed.

Speaking of monitoring, I’m speaking at PASS Summit this year on…you guessed “Monitoring Linux Performance for the SQL Server Admin” on Friday November 3th at 8am. If you want to learn more, I’ll see you there. 

Speaking at IT/Dev Connections 2017 – San Francisco!

I’m proud to announce that I will be delivering two sessions at IT/Dev Connections in San Francisco! This is my second year at IT/Dev Connections, real content for IT pros!

Networking Internals for the SQL Server Professional

Tuesday, 10/24/2017: 1:15 pm – 2:30 pm

Room: Contiental 7

Abstract

Once data leaves your SQL Server do you know what happens or is the world of networking a black box to you? Would you like to know how data is packaged up and transmitted to other systems and what to do when things go wrong? Are you tired of being frustrated with the network team? In this session we introduce how data moves between systems on networks and TCP/IP internals. We’ll discuss real world scenarios showing you how your network’s performance impacts the performance of your SQL Server and even your recovery objectives.

Learning Objectives

Networking Fundamentals, TCP/IP, Network Performance and how these impact SQL Server.

Linux OS Fundamentals for the SQL Admin

Wednesday, 10/25/2017: 1:15 pm – 2:30 pm

Abstract

PowerShell and SQL Server are now available on Linux and management wants you to leverage this shift in technology to more effectively manage your systems, but you’re a Windows admin, Don’t fear! It’s just an operating system. It has all the same components Windows has and in this session we’ll show you that. We will look at the Linux operating system architecture and show you how to interact with and manage a Linux system. By the end of this session you’ll be ready to go back to the office and get started working with Linux.

Learning Objectives

In this session we’ll cover the following  

  • Process control
  • Service control
  • Package installation
  • System resource management (CPU, disk and memory)
  • Using PowerShell to interact with Linux systems

IT/Dev Connections