Monthly Archives: December 2017

Attempting to Run SQL on Linux Inside Windows Subsystem for Linux

Shawn Melton MVP and dbatools contributor last week had an issue running SQL Server on Linux inside of Windows Subsystem for Linux.

I didn’t want to leave a brother hanging so I spent this morning digging into this a little bit. 

Reproducing the Issue

The first thing I had to do was reproduce the issue. So on my Windows 10 test VM I installed the Windows Subsystem for Linux, steps to do so are here and I installed the Ubuntu app.

Then, I fired up a bash shell using WSL and then I installed SQL Server on Linux for Ubuntu as documented here

Now, I completed the installation of SQL Server on Linux using mssql-conf when that program completes it attempts to start SQL Server on Linux. BOOM! I’m able to reproduce the same error.

Looking at the error, I decided to see if I could run SQL Server on Linux from the shell as the user mssql. This would remove systemd and mssql-conf from the picture. Basically I wanted to see if I could get another, more descriptive, error to pop out. To do that we’ll need to change over to the mssql user with su.

And then change into the working directory for SQL Server on Linux and try to launch SQL Server.

Now, doing that…generates same same error! Here’s the error in a search engine friendly form :)

Digging a Little Deeper

So now with the same error output, I decided to give it a cursory pass with strace to see if I could find anything that would put us closer to why SQL Server on Linux won’t start when using Windows Subsystem for Linux.

What you see in the strace output is the parent process creating the child sqlservr process and failing. In the first line of output you can see process 137 clone and return process ID 139. Which is how a parent process creates a child in Linux. Then process 139 tries to perform some setup operations like registering signal actions (rt_sigaction) and their corresponding routines to call when that signal is received by that process.

Now the only error I found in the output is the prctl call which returns invalid argument.This system call is to perform operations on a process.  On my WSL system the option being set PR_SET_PTRACER is for the Yama LSM subsystem which lives in /proc/sys/kernel/yama normally. This doesn’t exist on my Ubuntu WSL installation. I checked my CentOS full VMs and this exists. I checked a full Ubuntu installation and it’s there too.

After the error SQL Server calls tgkill and kills itself with the SIGABRT signal. A dump occurs and the program exits. 

What’s Really Happening?

Well I think something is missing from Windows Subsystem for Linux. Is it the Yama stuff…perhaps. But clearly SQL Server isn’t happy with the environment and kills itself. I haven’t dove into WSL yet and I don’t know how it’s implemented, but there could also be something up at that level too. Generally I don’t write blog posts where I don’t know exactly what’s going on, but I did want to let folks know that SQL on Linux doesn’t work on Windows Subsystem for Linux. 


A Novel Idea for High Availability in SQL Server on Linux

Over the past year we’ve learned about how SQL Server on Linux is implemented, leveraging SQLPAL and the team is pretty confident in their architectural decisions as indicated in this post here.

Now that there is this wrapper around SQL Server, this really opens up some interesting opportunities…perhaps we can leverage SQLPAL to facilitate some new high availability techniques.

When I was in graduate school, I worked on a research project, that became my master’s thesis. In this work, I developed a technique that synchronized the process address space of a virtual machine on two separate physical hypervisors.The technique involved an initial copy of all pages between the two systems and then selectively copying the virtual machine’s pages as they became dirty. Using this technique, the process address space of the virtual machine is synchronized between the two hypervisors. This allows for a significant reduction in the amount of information that had to be replicated between the hypervisors but more importantly…the virtual machines memory in sync which meant if hypervisor hosting the virtual machine crashed we could theoretically start the virtual machine on the second hypervisor.

Now, during my PASS Summit talk this year, I presented to the audience my theory that SQLPAL is virtualization. But it’s not machine virtualization, it’s process virtualization. Which means there’s a purpose built environment hosting the SQL Server process. This environment, SQLPAL, is the main allocator of resources from the physical system. It’s the thing that asks for memory, disk, network anything that’s needed from the underlying operating system.

Now, what if we took these two ideas and brought them together? What if SQLPAL was able to synchronize the program state and resources between two separate systems? Could we provide highly available SQL Services with a technique like this? I think we can. Perhaps we don’t even synchronize the pages between the system. Perhaps an even lighter technique could be used, such as duplicating the system calls between the two copies of SQL Server and thus implicitly synchronizing the program state.

Think about the possibilities…we could have a system that fails over with all the context of the currently active system, active connections could stay active, buffer pool populated, plan cache could still exist and not have to be rebuilt. Yes, we’ll likely need some sort of low latency, high bandwidth interconnect..but we have those. And there’s certainly more implementation details that need to be thought through…but I think there’s something here. 

A couple questions I thought of while writing this…

1. Does this provide more value than Availability Groups? I think so…program state remains in sync between the two systems. So things like user connections could be maintained during failover (with the appropriate relocation of the IP of course). I also think the quorum model would be simpler, as there is only one pair in the synchronization.

2. Does this provide more value than virtual machine migration, perhaps. This technique could be hypervisor independent.

I’d love to hear your thoughts on this! Most of all I want you to start thinking about new ways we can leverage SQLPAL and it’s abstraction from hardware.