Category Archives: SQL

Deploying SQL Server Availability Groups in Kubernetes

In this blog post, we’re going to work on deploying a SQL Server Availability Group in a Kubernetes Cluster in on-premises virtual machines. I’m going to walk you through the process as it’s documented by Microsoft at this link here. This document is very good but only shows you how to do it in Azure, we’re going to do it in VMs. I’m going to follow Microsoft’s documentation as much as possible, deviating only when necessary for on-premises deployments. I’m also going to explain the key Kubernetes concepts that you need to know to understand how all these pieces together. This is a long one, buckle up.

Creating Your Cluster

In my last blog post, I showed you how to create a three-node Kubernetes on premise, virtual machine based cluster. I’m going to use the same setup for this blog post. Check out how to create this cluster at this link here. There’s a reason I wrote that post first :)

Process Overview

Here’s the big picture of all the steps we will perform in this demonstration

  • Create a Namespace
  • Create Secrets
  • Create a Storage Class and mark it default
  • Create Persistent Volumes
  • Create a ServiceAccount, ClusterRole, ClusterRoleBinding and a Deployment
  • Deploy the SQL Server Pods
  • Expose the Availability Group Services as a Kubernetes Service
  • Connect to our Availability Group from outside our cluster
  • Create a database
I’m going to be running these commands from the Cluster Master, you can run these from any host that can talk to your API Server. We’re going to execute the first few commands using kubectl directly then we’re going to switch to loading our configurations from “deployment manifests” which describe our configuration in YAML files and in this case, describe the Availability Group Deployment.

Create a Namespace

First up, we’ll create a Namespace. In Kubernetes you can use namespaces to put boundaries around resources for organizational or security reasons.  

demo@k8s-master1:~/ag$ kubectl create namespace ag1

namespace/ag1 created


demo@k8s-master1:~/ag$ kubectl get namespaces

NAME          STATUS   AGE

ag1           Active   11m

default       Active   28h

kube-public   Active   28h

kube-system   Active   28h

Create Secrets

In Kubernetes, the cluster store can hold Secrets…in other words sensitive data like passwords. This is valuable because we don’t want to store this information in our containers and we certainly don’t want to have our passwords as clear text in our deployment manifests. So in those manifests, we can reference these values and then upon deployment the Pods will retrieve the secrets when they’re started and pass the secret into the container for the application to use. In this case, we’re creating two secrets to be used by our Pods. The first is the SA password we’ll use for our SQL Server Instance, the second is the password for our Service Master Key which is behind the certificates that are used to authenticate the Availability Group (*cough*) Database Mirroring endpoints.

Let’s create the secrets with kubectl.

demo@k8s-master1:~/ag$ kubectl create secret generic sql-secrets –from-literal=sapassword=”1-S0methingS@Str0ng” –from-literal=masterkeypassword=”2-S0methingS@Str0ng”  –namespace ag1

secret/sql-secrets created

 
Want to know how to read a secret out of the cluster store?
 
Well here you go, just change the masterkeypassword string to the name of the secret you want to decode. 
 

demo@k8s-master1:~/ag$ kubectl get secret sql-secrets -o yaml –namespace ag1 | grep masterkeypassword  | awk ‘{ print $2 }’ | base64 –decode 

2-S0methingS@Str0ng
demo@k8s-master1:~/ag
 

Create a Storage Class and Mark it Default

OK, now it’s time to work on our storage configuration. This is where we’re going to deviate most from the Microsoft documentation. We’re going to start off by creating a StorageClass. A StorageClass is a way to group storage together based on the storage’s attributes. In this case, we’re going to use a StorageClass because later on in the YAML files provided by Microsoft they are going to make storage requests of the cluster referencing the default storage class in the namespace. So that’s what this code does, creates the local-storage StorageClass and marks it as default.

Here is the YAML definition of our StorageClass, save this into a file called StorageClass.yaml.

File 1 – StorageClass.yaml

Once you have that code saved into StorageClass.yaml, go ahead and run the following command to pass the StorageClass manifest to the API server for it to create the resource for you.

demo@k8s-master1:~/ag$ kubectl apply -f StorageClass.yaml –namespace ag1

storageclass.storage.k8s.io/local-storage created
 

demo@k8s-master1:~/ag$ kubectl get storageclass –namespace ag1

NAME                      PROVISIONER                    AGE

local-storage (default)   kubernetes.io/no-provisioner   3h20m

Create Persistent Volumes

Next up is creating our actual storage. In Kubernetes, the cluster provides storage to Pods and the Pods request the storage. Cluster storage can be many different types. You can have NFS, virtual disks from your cloud provider, local storage and many more. Local storage is the storage that’s attached to the Kubernetes Node itself. In this demonstration, we’re going to use local storage. Since we’re deploying Availability Groups that will put a Pod on each Node in our cluster, we’re going to need to define three PersistentVolumes, one on each Node.. Looking at the code in File 2 – PV.yaml (below) you will see three PersistentVolumes each with a different name, all pointing to /var/opt/mssql. This will be local on each Node in our cluster. So we will need to make a /var/opt/mssql directory on each node in our cluster. So go ahead and do that now. Storage in HA systems requires understanding of what data is living where in your system and layering the appropriate data protections to meet your recovery objectives. This configuration places the storage on each Node in your Kubernetes Cluster. For more information on the local storage type in Kubernetes check out this post here. In that post they specifically call out that this Persistent Volume type is appropriate for “Distributed storage systems that shard or replicate data across multiple nodes” and Availability Groups fall into that category.

In an effort to loosely couple Pods and their storage, the cluster administrator defines Persistent Volumes, that what you’ll do when you run the code in File 2 – PV.yaml below. Then the pods will use PersistentVolumeClaim to attach the Pods to the PersistentVolumes.

File 2 – PV.yaml

Once you have that code saved into PV.yaml, go ahead and run the following command to pass the PersistentVolume manifest to the API server for it to create the resource for you.

demo@k8s-master1:~/ag$ kubectl apply -f pv.yaml –namespace ag1

persistentvolume/ag1-pv-volume-node1 created

persistentvolume/ag1-pv-volume-node2 created

persistentvolume/ag1-pv-volume-node3 created


demo@k8s-master1:~/ag$ kubectl get PersistentVolume –namespace ag1

NAME                  CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM   STORAGECLASS    REASON   AGE

ag1-pv-volume-node1   10Gi       RWO            Retain           Available           local-storage            29s

ag1-pv-volume-node2   10Gi       RWO            Retain           Available           local-storage            29s

ag1-pv-volume-node3   10Gi       RWO            Retain           Available           local-storage            29s

Create a ServiceAccount, ClusterRole, ClusterRoleBinding and a Deployment

Now that we have our storage created it’s time to get started creating our resources for our cluster. In the Samples provided by Microsoft, they provide operator.yaml. For our deployment, we’ll use that file just as is from Microsoft. I have the contents below for you. The core details of the operator.yaml manifest file are that it creates a ServiceAccount, a ClusterRole, a ClusterRoleBinding and also defines the Deployment for the Kubernetes Operator Pod for our Availability Group. Basically, this sets up the security contexts in the cluster for our Pods and also kicks off the Deployment of the Availability Group Kubernetes Operator Pod. A Deployment is a Kubernetes object that will ensure that the defined Pod is up and running at all times. Looking towards the bottom of the file you’ll see replicas: 1 this defines that there must be exactly one Operator pod using the 2019-CTP2.1-ubuntu container at all times. 

File 3 – operator.yaml 

Now let’s go ahead and deploy operator.yaml and create the resources. 

demo@k8s-master1:~/ag$ kubectl apply -f operator.yaml –namespace ag1

Warning: kubectl apply should be used on resource created by either kubectl create –save-config or kubectl apply

namespace/ag1 configured

serviceaccount/mssql-operator created

clusterrole.rbac.authorization.k8s.io/mssql-operator-ag1 created

clusterrolebinding.rbac.authorization.k8s.io/mssql-operator-ag1 created

deployment.apps/mssql-operator created

 
If you want to inspect the resources further here are some commands to do that. I’ll leave this as an exercise for the reader.
 

kubectl get ServiceAccount –namespace ag1

kubectl describe ClusterRole mssql-operator-ag1  –namespace ag1

kubectl describe ClusterRoleBinding mssql-operator-ag1  –namespace ag1

kubectl get deployment –namespace ag1

kubectl describe deployment –namespace ag1 

Before moving on, we’ll want to make sure our Deployment is finished and Ready. We can use kubectl get deployment for that, the key here is the DESIRED and CURRENT are both 1. This means our Operator is online and ready.

demo@k8s-master1:~/ag$ kubectl get deployment –namespace ag1

NAME             DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE

mssql-operator   1         1         1            1           4m38s


If you want to double check that, you can make sure your Pods online with kubectl get pods. There you can see the operator Pod’s Status is Running.
 

demo@k8s-master1:~/ag$ kubectl get pods –namespace ag1

NAME                              READY   STATUS    RESTARTS   AGE

mssql-operator-6d88564d97-hqz8r   1/1     Running   0          7m8s

Deploy the SQL Server Pods

Ok, now that the Storage, Security and the Operator are deployed. Now it’s time to deploy the Pods for our SQL Server Availability Group Replicas. In File 3 – sqlserver.yaml (below) we define the resources needed to get our Pods up and running. But, this file is a little bit different…it’s using Custom Resources. Your first hint to that is look at the apiVersion, it references mssql.microsoft.com/v1 and the kind: SqlServer. Microsoft has chosen to implement a custom object likely so that it can abstract away some of the complexity of creating this configuration. 

In this file, you are going to see a lot of things come together from the other configurations we’ve made so far in this post. You’ll find that the ag1 namespace is referenced, you’ll see the storageClass is default, and the saPassword being read from the Cluster’s secrets.

In addition to the custom SqlServer resource, in this file, you will also find the Service object defined for each Pod replica. The Service provides persistent access to the Pod’s application. In this file, we create three services all on 1433. This port is the port listening on the cluster network. 

I did make one change to this file from the original Microsoft file. I changed the Service Type from LoadBalancer to NodePort. Doing this exposes the Service which is running on 1433 inside the Pod and also on the Pod network to the Node’s real IP address and bind the Pods internal 1433 port to an available ephemeral port on the Node. This means I can connect to each of the SQL Instances outside of the Kubernetes cluster using the IP address of the virtual machine (Node) and the NodePort ephemeral port. The type LoadBalancer is used in cloud scenarios where the cloud provider stands up and load balancer resource for you. We’re doing this On-Prem, so this had to change.

File 3 – sqlserver.yaml

Let’s now deploy this Pods and Services by passing sqlserver.yaml into our API Server. This might take a minute or two as the containers are downloaded from the Internet and started. In the output below you can see the three Pods are Created and the three services are created. I do want to call out when you create the Pods, you will see 6 Pods when you use the kubectl get pods command. The “initialize” Pods are managed by the Deployment as Kubernetes Jobs that are used to configure the Availability Group Replicas, these run only once and complete. When running the deployment, a successful deployment will have three mssqlN-0 Pods up with a status of Running and the  initialize pods are Completed.

demo@k8s-master1:~/ag$ kubectl apply -f sqlserver.yaml –namespace ag1

sqlserver.mssql.microsoft.com/mssql1 created

service/mssql1 created

sqlserver.mssql.microsoft.com/mssql2 created

service/mssql2 created

sqlserver.mssql.microsoft.com/mssql3 created

service/mssql3 created
 

demo@k8s-master1:~/ag$ kubectl get pods –namespace ag1

NAME                              READY   STATUS      RESTARTS   AGE

mssql-initialize-mssql1-klhzh     0/1     Completed   0          7m

mssql-initialize-mssql2-8vd6r     0/1     Completed   0          6m57s

mssql-initialize-mssql3-4tjvd     0/1     Completed   0          6m54s

mssql-operator-6d88564d97-hqz8r   1/1     Running     0          30m

mssql1-0                          2/2     Running     0          7m

mssql2-0                          2/2     Running     0          6m58s

mssql3-0                          2/2     Running     0          6m55s

If you want to get a peek at what’s been written to standard out in a Pod you can use kubectl logs. Let’s do that for one of the initialize Pods above to see what they did for us. In the output below you can see it’s the job of the initialize container to configure the Availability Group on the Pod that’s running our AG Replica on that same Node. There’s an initialize Pod for each replica on each Node.

demo@k8s-master1:~/ag$ kubectl logs mssql-initialize-mssql1-klhzh –namespace ag1

2018/11/13 02:26:14 Using randomly generated master key password

2018/11/13 02:26:14 Statefulset has 1 replicas but could only find 0

Found pod [mssql1-0] with owner [7d5ef98d-e6eb-11e8-816e-000c296ac327]2018/11/13 02:26:15 [192.168.3.5] Setting sa pasword…

ERROR: 2018/11/13 02:26:15 [192.168.3.5] Could not connect: Unresponsive or down Unable to open tcp connection with host ‘192.168.3.5:1433’: dial tcp 192.168.3.5:1433:

…output omitted…

ERROR: 2018/11/13 02:26:28 [192.168.3.5] Could not connect: Unresponsive or down Unable to open tcp connection with host ‘192.168.3.5:1433’: dial tcp 192.168.3.5:1433: getsockopt: connection refused

ERROR: 2018/11/13 02:26:28 [192.168.3.5] Both old and new sa password failed

ERROR: 2018/11/13 02:26:29 [192.168.3.5] Could not connect: Unresponsive or down Login error: mssql: Login failed for user ‘sa’.

ERROR: 2018/11/13 02:26:29 [192.168.3.5] Could not connect: Unresponsive or down Login error: mssql: Login failed for user ‘sa’.

ERROR: 2018/11/13 02:26:29 [192.168.3.5] Both old and new sa password failed

ERROR: 2018/11/13 02:26:30 [192.168.3.5] Could not connect: Unresponsive or down Login error: mssql: Login failed for user ‘sa’.

2018/11/13 02:26:31 [192.168.3.5] sa password updated

2018/11/13 02:26:31 [192.168.3.5] Creating master key

2018/11/13 02:26:31 [192.168.3.5] Creating sql login dbm-mssql1 if it does not already exist

2018/11/13 02:26:31 [192.168.3.5] Creating sql user dbm-mssql1 if it does not already exist

2018/11/13 02:26:31 [192.168.3.5] Granting dbm-mssql1 availability group permissions

2018/11/13 02:26:31 [192.168.3.5] Granting dbm-mssql1 control to the database mirroring endpoint, dbm, to if it already exists

2018/11/13 02:26:31 [192.168.3.5] Granting  dbm-mssql1 control to the certificates used on the database mirroring endpoint, dbm

2018/11/13 02:26:31 [192.168.3.5] Granting dbm-mssql1 control to availibility group ag1

2018/11/13 02:26:31 Updating secret to note that the initialization is complete

2018/11/13 02:26:31 Uploading cert secret for local instance…

2018/11/13 02:26:31 Initialization complete

 
Here’s some more commands for you to further investigate the Availability Group’s Kubernetes Configuration. Poke around in there and see what’s up. 
 

demo@k8s-master1:~/ag$ kubectl describe pods  -n ag1

demo@k8s-master1:~/ag$ kubectl describe statefulset –namespace ag1

 

Expose the Availability Group Services as a Service

Ok, so we’ve created the AG Replicas in our Kubernetes Cluster. Now it’s time to create a Kubernetes Service for access to the Primary and Read Only Read-only Replicas. Kubernetes handles networking for us. It’s responsible for adding persistency to the ephemeral container world. So these services are going to be our application’s primary access points to connect to the SQL Server. For our On-Prem scenario, we’re going to deviate from Microsoft’s documentation, similar to what we had to do when we exposed each individual replica as a Service. In the ag-services.yaml file provided in the Microsoft examples, the AG Services are Type LoadBalancer. This is Service type is used in cloud scenarios and we don’t have a that in our On-Prem configuration. So we’re going to change this again to NodePort. The reason I’m choosing NodePort here is to expose the service outside the cluster. This way I will be able to use my own tools to access the SQL Instance from outside the cluster.

Here’s the code for our ag-services.yaml Services manifest.
File 4 – ag-services.yaml

Now let’s apply that manifest to create our AG Services in our cluster.  

demo@k8s-master1:~/ag$ kubectl apply -f ag-services.yaml –namespace ag1

service/ag1-primary created

service/ag1-secondary created


And take a quick peek at our Service configuration. You’ll see we don’t have an External-IP as described in Microsoft walk-through. This would come from our LoadBalancer. Since we’re using NodePort, our IP address will be that of the actual Nodes. 

demo@k8s-master1:~/ag$ kubectl get service –namespace ag1

NAME            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE

ag1             ClusterIP   None             <none>        1433/TCP,5022/TCP   9h

ag1-primary     NodePort    10.110.116.77    <none>        1433:30625/TCP      2m6s

ag1-secondary   NodePort    10.100.132.228   <none>        1433:30380/TCP      2m6s

mssql1          NodePort    10.103.126.32    <none>        1433:32572/TCP      9h

mssql2          NodePort    10.102.175.227   <none>        1433:30001/TCP      9h

mssql3          NodePort    10.107.215.217   <none>        1433:31539/TCP      9h


What IP and Port do we connect our applications to for the Availability Group Primary Replica? For the IP address, it’s the real address of the Node, in other words the actual IP address of your virtual machine. For the port, well, let’s use kubectl describe for that. Here you can see the NodePort is 30625. The port is allocated by the cluster, yours will vary from this as this can be any port in the ephemeral port range. We can point our applications to any node in our cluster on THAT port and the cluster will redirect the traffic to the correct Node internal. So for me, it’s k8s-master1:30625.

demo@k8s-master1:~/ag$ kubectl describe service  ag1-primary -n ag1

Name:                     ag1-primary

Namespace:                ag1

Labels:                   <none>

Annotations:              kubectl.kubernetes.io/last-applied-configuration:

                            {“apiVersion”:”v1″,”kind”:”Service”,”metadata”:{“annotations”:{},”name”:”ag1-primary”,”namespace”:”ag1″},”spec”:{“ports”:[{“name”:”tds”,”p…

Selector:                 role.ag.mssql.microsoft.com/ag1=primary,type=sqlservr

Type:                     NodePort

IP:                       10.110.116.77

Port:                     tds  1433/TCP

TargetPort:               1433/TCP

NodePort:                 tds  30625/TCP

Endpoints:                192.168.3.5:1433

Session Affinity:         None

External Traffic Policy:  Cluster

Events:                   <none>


Let’s use sqlcmd to test external connectivity to our SQL Instance. The first one we will connect to k8s-master1 on Port 30625, the second we’ll connect to k8s-node1’s IP address (remember that’s coming from DHCP so you’ll have to go get if for your network) on port 30625. In the output below you can see we always hit the primary replica for our AG, mssql1-0. To use the name k8s-master1, I have an entry in my local hosts file to resolve the name to the IP of the server. You may want to do the same.

Anthonys-MacBook-Pro:~ aen$ sqlcmd -S k8s-master1,30625 -Q “SELECT @@SERVERNAME” -U sa -p

Password: 

                                                                                                                                

——————————————————————————————————————————–

mssql1-0                                                                                                                        

 

(1 rows affected)

 

Network packet size (bytes): 4096

1 xact[s]:

Clock Time (ms.): total         2  avg   2.0 (500.0 xacts per sec.)


Anthonys-MacBook-Pro:~ aen$ sqlcmd -S 172.16.94.136,30625 -Q “SELECT @@SERVERNAME” -U sa -p

Password: 

                                                                                                                                

——————————————————————————————————————————–

mssql1-0                                                                                                                        

 

(1 rows affected)

 

Network packet size (bytes): 4096

1 xact[s]:

Clock Time (ms.): total         1  avg   1.0 (1000.0 xacts per sec.)

In figure 1, you can find the connection information to connect to our AG primary with Azure Data Studio. Remember, I have that host name in a host file. So create that entry if you need one.

NewImage

Figure 1 – Connection Parameters for Azure Data Studio

Create an Availability Group Database

Now with everything up and running, let’s create an Availability Group database. We’re going to depend on direct seeding to copy the database to each Replica in our Availability Group, pretty nice feature, eh? Take the text below and use sqlcmd  or Azure Data Studio to create the database, Hey, in the code below, see how I’m backing up to nul, yea if this is a real database…that’s a bad idea. I’m just doing that to get the database ready to be added to the AG.
Listing 5 – Create your Availability Group Database
 
Now let’s check the status of our AG configuration with Azure Data Studio. Here’s the query to pull the AG configuration information:
Query 1 – Availability Group Configuration Query
 
 
NewImage
 
Figure 2 – Availability Group Configuration 
 
One thing that’s interesting to call out in this output in Figure 2 is you see the endpoint_url is on the 192.168.x.y network. This is the Pod network. Remember way back when we created our cluster together in my previous blog post, we needed to ensure that our Pods have the ability to communicate with each other…so we deployed the Calico Pod network. The network range on that what 192.168.0.0/16. That’s this network, our Pods’ network.

Wrap up

If you want to get a peek at all the resources you created today, use the following. This will generate this output for you to look at your configuration.
 

demo@k8s-master1:~/ag$ kubectl get all –namespace ag1

NAME                                  READY   STATUS      RESTARTS   AGE

pod/mssql-initialize-mssql1-klhzh     0/1     Completed   0          12h

pod/mssql-initialize-mssql2-8vd6r     0/1     Completed   0          12h

pod/mssql-initialize-mssql3-4tjvd     0/1     Completed   0          12h

pod/mssql-operator-6d88564d97-hqz8r   1/1     Running     0          12h

pod/mssql1-0                          2/2     Running     0          12h

pod/mssql2-0                          2/2     Running     0          12h

pod/mssql3-0                          2/2     Running     0          12h

 

NAME                    TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE

service/ag1             ClusterIP   None             <none>        1433/TCP,5022/TCP   12h

service/ag1-primary     NodePort    10.110.116.77    <none>        1433:30625/TCP      176m

service/ag1-secondary   NodePort    10.100.132.228   <none>        1433:30380/TCP      176m

service/mssql1          NodePort    10.103.126.32    <none>        1433:32572/TCP      12h

service/mssql2          NodePort    10.102.175.227   <none>        1433:30001/TCP      12h

service/mssql3          NodePort    10.107.215.217   <none>        1433:31539/TCP      12h

 

NAME                             DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE

deployment.apps/mssql-operator   1         1         1            1           12h

 

NAME                                        DESIRED   CURRENT   READY   AGE

replicaset.apps/mssql-operator-6d88564d97   1         1         1       12h

 

NAME                      DESIRED   CURRENT   AGE

statefulset.apps/mssql1   1         1         12h

statefulset.apps/mssql2   1         1         12h

statefulset.apps/mssql3   1         1         12h

 

NAME                                COMPLETIONS   DURATION   AGE

job.batch/mssql-initialize-mssql1   1/1           19s        12h

job.batch/mssql-initialize-mssql2   1/1           18s        12h

job.batch/mssql-initialize-mssql3   1/1           17s        12h


 Want to dump your entire configuration to yaml, use this
 

kubectl get all –namespace ag1 -o yaml

 
Well, I hope you enjoyed this lengthy walkthrough on how to create a SQL Server Always On Availability Group using On-Premises virtual machines. I tried to do two things with this post, first show you how to set this up, and second introduce the SQL Server pro to core Kubernetes constructs.

Please feel free to contact me with any questions regarding Linux, Kubernetes or other SQL Server related issues at : aen@centinosystems.com


Getting Started with Installing Kubernetes On-Prem

Let’s get you started on your Kubernetes journey with installing Kubernetes on premises in virtual machines. 

Kubernetes is a distributed system, you will be creating a cluster which will have a master node that is in charge of all operations in your cluster. In this walkthrough we’ll create three workers which will run our applications. This cluster topology is, by no means, production ready. If you’re looking for production cluster builds check out Kubernetes documentation. Here and here. The primary components that need high availability in a Kubernetes cluster are the API Server which controls the state of the cluster and the etcd database which stores the persistent state of the cluster. You can learn more about Kubernetes cluster components here.

In our demonstration here, the master is where the API Server, etcd, and the other control plan functions will live. The workers, will be joined to the cluster and run our application workloads. 

Get your infrastructure sorted

I’m using 4 Ubuntu Virtual machines in VMware Fusion on my Mac. Each with 2vCPUs and 2GB of RAM running Ubuntu 16.04.5. Ubuntu 18 requires a slightly different install. Documented here. In there you will add the Docker repository, then install Docker from there. The instructions below get Docker from Ubuntu’s repository 

  • k8s-master – 172.16.94.15
  • K8s-node1 – DHCP
  • K8s-node2 – DHCP
  • K8s-node3 – DHCP

Ensure that each host has a unique name and that all nodes can have network reachability between each other. Take note of the IPs, because you will need to log into each node with SSH. If you need assistance getting your environment ready, check out my training on Pluralsight to get you started here! I have courses on installation, command line basics all the way up through advanced topics on networking and performance.

Another requirement, which Klaus Aschenbrenner reminded me, is that you need to disable the swap on any system which you will run the kubelet, which in our case is all systems. To do so you need to turn swap off with sudo swapoff -a and edit /etc/fstab removing or commenting out the swap volume entry. 

Overview of the cluster creation process

  • Install Kubernetes packages on all nodes
    • Add Kubernetes’ apt repositories
    • Install the required software for Kubernetes
  • Download deployment files for your pod network
  • Create a Kubernetes cluster on the master
    • We’re going to use a utility called kubeadm to create our cluster with a basic configuration
  • Install a Pod Network
  • Join our three worker nodes to our cluster

Install Kubernetes

Let’s start off with installing Kubernetes on to all of the nodes in our system. This is going to require logging into each server via SSH, adding the Kubernetes apt repositories and installing the correct packages. Perform the following tasks on ALL nodes in your cluster, the master and the three workers. If you add more nodes, you will need to install these packages on those nodes. 

Add the gpg key for the Kubernetes Apt repository to your local system

demo@k8s-master1:~$ curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add –


Add the Kubernetes Apt repository to your local repository locations

demo@k8s-master1:~$ sudo bash -c ‘cat <<EOF >/etc/apt/sources.list.d/kubernetes.list

> deb https://apt.kubernetes.io/ kubernetes-xenial main

> EOF’


Next, we’ll update our apt package lists

demo@k8s-master1:~$ sudo apt-get update

 
Install the required packages
 

demo@k8s-master1:~$ sudo apt-get install -y kubelet kubeadm kubectl docker.io

 
Then we need to tell apt to not update these packages. In Kubernetes, cluster upgrades will be managed by…you guessed it…Kubernetes

demo@k8s-master1:~$ sudo apt-mark hold kubelet kubeadm kubectl docker.io

 
Here’s what you just installed
  • Kubelet – On each node in the cluster, this is in charge of starting and stopping pods in response to the state defined on the API Server on the master 
  • Kubeadm – Primary command line utility for creating your cluster
  • Kubectl – Primary command line utility for working with your cluster
  • Docker – Remember, that Kubernetes is a container orchestrator so we’ll need a container runtime to run your containers. We’re using Docker. You can use other container runtimes if required

Download the YAML files for your Pod Network

Now, only on the master, let’s download the YAML deployment files for your Pod network and get are cluster created. Networking in Kubernetes is different than what you’d expect. For Pods to be on different nodes to be able to communicate with each other on the same IP network, you’ll want to create a Pod network. Which essentially is an overlay network that gives you a uniform address space for Pods to operate in. The decision of which Pod network to use, or even if you need one is very dependent on your local or cloud infrastructure. For this demo, I’m going to use the Calico Pod network overlay. The code below will download the Pod definition manifests in YAML and we’ll deploy those into our cluster. This start up a container on our system in what’s called a DaemonSet. A DaemonSet is a Kubernetes object that will start the specified container on all or some of the nodes in the cluster. In this case, the calico network Pod will be deployed on all nodes in our cluster. So as we join nodes, you might see some delay in nodes becoming ready…this is because the container is being pulled and started on the node.
 
Download the YAML for the Pod network
 

demo@k8s-master1:~$ wget https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/rbac-kdd.yaml

demo@k8s-master1:~$ wget https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/1.7/calico.yaml


If you need to change the address of your pod network edit calico.yaml, look for the name: CALICO_IPV4POOL_CIDR and set the value: to your specified CIDR range. It’s 192.168.0.0/16 by default. 

Creating a Kubernetes Cluster

Now we’re ready to create our Kubernetes cluster, we’re going to use kubeadm to help us get this done. It’s a community-based tool that does a lot of the heavy lifting for you.
 
To create a cluster do this, here we’re specifying a CIDR range to match that in our calico.yaml file.
 

demo@k8s-master1:~$ sudo kubeadm init –pod-network-cidr=192.168.0.0/16


What’s happening behind the scenes with kubeadm init:
  • Creates a certificate authority – Kubernetes uses certificates to secure communication between components and also to verify the identity of hosts in the cluster
  • Creates configuration files – On the master, this will create configuration files for various Kubernetes cluster components
  • Pulls control plane images – the services implementing the cluster components are deployed into the cluster as containers. Very cool! You can, of course, run these as local system daemons on the hosts, but Kubernetes suggests keeping them inside containers
  • Bootstraps the control plane pods – starts up the pods and creates static manifests on the master start automatically when the master node starts up
  • Taints the master to just system pods – this means the master will run (schedule) only system Pods, not user Pods. This is ideal for production. In testing, you may want to untaint the master, you’ll really want to do this if you’re running a single node cluster. See this link for details on that.
  • Generates a bootstrap token – used to join worker nodes to the cluster
  • Starts any add-ons – the most common add-ons are the DNS pod and the master’s kube-proxy
If you see this, you’re good to go! Keep that join command handy. We’ll need it in a second.
 

Your Kubernetes master has initialized successfully!

…output omitted

You can now join any number of machines by running the following on each node

as root:

  kubeadm join 172.16.94.15:6443 –token 2a71vm.aat5o5vd0eip9yrx –discovery-token-ca-cert-hash sha256:57b64257181341928e60548314f28aa0d2b15f4d81bf9ae9afdae0cee6baf247

The output from your cluster creation is very important, it’s going to give you the code needed to access your cluster as a non-root user, the code needed to create your Pod network and also the code needed to join worker nodes to your cluster (just go ahead and copy this into a text file right now). Let’s go through each of those together.

Configuring your cluster for access from the master node as a non-privileged user

This will allow you to log into your system with a regular account and administer your cluster.

mkdir -p $HOME/.kube

sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config

sudo chown $(id -u):$(id -g) $HOME/.kube/config

Create your Pod network

Now that your cluster is created, you can deploy the YAML files for your Pod network. You must do this prior to adding more nodes to your cluster and certainly before starting any Pods on those nodes. We are going to use kubectl -f to deploy the Pod network from the YAML file we downloaded earlier. 

demo@k8s-master1:~$ kubectl apply -f rbac-kdd.yaml

clusterrole.rbac.authorization.k8s.io/calico-node created

clusterrolebinding.rbac.authorization.k8s.io/calico-node created

demo@k8s-master1:~$ kubectl apply -f calico.yaml

configmap/calico-config created

service/calico-typha created

deployment.apps/calico-typha created

poddisruptionbudget.policy/calico-typha created

daemonset.extensions/calico-node created

serviceaccount/calico-node created

customresourcedefinition.apiextensions.k8s.io/felixconfigurations.crd.projectcalico.org created

customresourcedefinition.apiextensions.k8s.io/bgppeers.crd.projectcalico.org created

customresourcedefinition.apiextensions.k8s.io/bgpconfigurations.crd.projectcalico.org created

customresourcedefinition.apiextensions.k8s.io/ippools.crd.projectcalico.org created

customresourcedefinition.apiextensions.k8s.io/hostendpoints.crd.projectcalico.org created

customresourcedefinition.apiextensions.k8s.io/clusterinformations.crd.projectcalico.org created

customresourcedefinition.apiextensions.k8s.io/globalnetworkpolicies.crd.projectcalico.org created

customresourcedefinition.apiextensions.k8s.io/globalnetworksets.crd.projectcalico.org created

customresourcedefinition.apiextensions.k8s.io/networkpolicies.crd.projectcalico.org created


Before moving forward, check for the creation of the Calico pods and also the DNS pods, once these are created and the STATUS is Running then you can proceed. In this output here you can also see the other components of your Kubernetes cluster. You see the containers running etcd, API Server, the Controller Manager, kube-proxy and the Scheduler.

demo@k8s-master1:~$ kubectl get pods –all-namespaces

NAMESPACE     NAME                                  READY   STATUS    RESTARTS   AGE

kube-system   calico-node-6ll9j                     2/2     Running   0          2m5s

kube-system   coredns-576cbf47c7-8dgzl              1/1     Running   0          9m59s

kube-system   coredns-576cbf47c7-cc9x2              1/1     Running   0          9m59s

kube-system   etcd-k8s-master1                      1/1     Running   0          8m58s

kube-system   kube-apiserver-k8s-master1            1/1     Running   0          9m16s

kube-system   kube-controller-manager-k8s-master1   1/1     Running   0          9m16s

kube-system   kube-proxy-8z9t7                      1/1     Running   0          9m59s

kube-system   kube-scheduler-k8s-master1            1/1     Running   0          8m55s

 

Joining worker nodes to your cluster

Now on each of the worker nodes, let’s use kubeadm join to join the worker nodes to the cluster. Go back to the output of kubeadm init and copy the string from that output be sure to put a sudo on the front before you do this on each node. The process below is called a TLS bootstrap. This securely joins the node to the cluster over TLS and authenticates the host with server certificates. 
 

demo@k8s-node1:~$ sudo kubeadm join 172.16.94.15:6443 –token 2a71vm.aat5o5vd0eip9yrx –discovery-token-ca-cert-hash sha256:57b64257181341928e60548314f28aa0d2b15f4d81bf9ae9afdae0cee6baf247

[preflight] running pre-flight checks

[discovery] Trying to connect to API Server “172.16.94.15:6443”

[discovery] Created cluster-info discovery client, requesting info from “https://172.16.94.15:6443”

[discovery] Requesting info from “https://172.16.94.15:6443” again to validate TLS against the pinned public key

[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server “172.16.94.15:6443”

[discovery] Successfully established connection with API Server “172.16.94.15:6443”

[kubelet] Downloading configuration for the kubelet from the “kubelet-config-1.12” ConfigMap in the kube-system namespace

[kubelet] Writing kubelet configuration to file “/var/lib/kubelet/config.yaml”

[kubelet] Writing kubelet environment file with flags to file “/var/lib/kubelet/kubeadm-flags.env”

[preflight] Activating the kubelet service

[tlsbootstrap] Waiting for the kubelet to perform the TLS Bootstrap…

[patchnode] Uploading the CRI Socket information “/var/run/dockershim.sock” to the Node API object “k8s-node1” as an annotation

 

This node has joined the cluster:

* Certificate signing request was sent to apiserver and a response was received.

* The Kubelet was informed of the new secure connection details.
 

Run ‘kubectl get nodes’ on the master to see this node join the cluster.

 
If you didn’t keep the token or the CA Cert Hash in the earlier steps, go back to the master and run these commands. Also note, that join token is only valid for 24 hours. 
 
To get the current join token
 

demo@k8s-master1:~$ kubeadm token list


To get the CA Cert Hash
 

demo@k8s-master1:~$ openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed ‘s/^.* //’

 
Back on the master, check on the status of your nodes joining the cluster. These nodes are currently NotReady, behind the scenes they’re pulling the Calico container and setting up the Pod network.
 

demo@k8s-master1:~$ kubectl get nodes

NAME          STATUS     ROLES    AGE    VERSION

k8s-master1   Ready      master   14m    v1.12.2

k8s-node1     NotReady   <none>   100s   v1.12.2

k8s-node2     NotReady   <none>   96s    v1.12.2

k8s-node3     NotReady   <none>   94s    v1.12.2

 
And here we are with a fully functional Kubernetes cluster! All nodes joined and Ready.

demo@k8s-master1:~$ kubectl get nodes

NAME          STATUS   ROLES    AGE     VERSION

k8s-master1   Ready    master   15m     v1.12.2

k8s-node1     Ready    <none>   2m34s   v1.12.2

k8s-node2     Ready    <none>   2m30s   v1.12.2

k8s-node3     Ready    <none>   2m28s   v1.12.2


In our next post, we’ll deploy a SQL Server Pod into our freshly created Kubernetes cluster.
 
Please feel free to contact me with any questions regarding Linux or other SQL Server related issues at: aen@centinosystems.com

My Current Training Courses on Pluralsight!

Here’s a run down of the Linux training that I have available on Pluralsight!

Just getting started! 

Understanding and Using Essential Tools for Enterprise Linux 7 – If you heard what Linux is and want to get started this is the place to be. We’ll cover installation, command line basics, the file system, text editors and more! This is my most popular course.

SQL Server on Linux Administration Fundamentals – SQL Server is available for Linux and Docker! This course is targeted towards both the SQL Server DBA and the Linux professional to get you started using SQL Server on Linux. We cover architecture, administration basics, Tools and backup and recovery!

Provisioning Microsoft Azure Virtual Machines – Need to deploy a VM in the cloud, start here! We cover the basics of getting started and walk through how to deploy VMs both in the portal and at the command line. We also cover Azure Disk storage and building custom virtual machine images for consistent repeatable deployments. 

Deeper cuts!

Once you’ve mastered the basics let’s move you along in your Linux journey and get your ready to run your applications and servers in production! 

LFCE: Advanced Network and System Administration – If you’re going to run Linux in production, you should watch this course. We cover managing services, system performance, package management and sharing files with Samba and NFS. A must see. This is my second most popular course.

LFCE: Advanced Linux Networking – Tired of the network team saying “I can ping it” if so, watch this course. We’ll learn about the OSI model, addressing and subnets, ARP and DNS, how routing works, TCP internals and my network troubleshooting model!

LFCE: Network and Host Security – Speaking of production ready, this course is a must. We cover security concepts and architecture, Linux firewalls including iptables and firewalld, we also cover remote access with SSH and also cover how to reach applications behind firewalls wiht tunneling services through SSH.  

For the those with services on the internet!

LFCE: Linux Service Management – HTTP Services – Need to set up a web server, watch this course and learn how. We’ll cover Apache’s architecture, how DNS works, installation and configuration, implementing Apache security features and troubleshooting and logging.

LFCE: Linux Service Management – Advanced HTTP Services – Once you have that web server up an online, learn how to really build scalable internet architecture with this course! We cover proxy and caching techniques and also a deep dive into Apache Modules.

LFCE: Linux Service Management – Advanced Email Services – If you’re a consulting customer of mine and you ask me to build you an email server I’m going to tell you to give me your credit card because we’re going to buy a cloud based solution for that. But for those of you who need to know how to build SMTP based mail environments this is your course! We’ll cover installation and configuration of Postfix as a mail server, we cover relay services and how to access your mail server securely.

SQL Server on Linux – External Memory Pressure with 2019 CTP2

In this blog post we’re going to revisit how SQL Server on Linux responds to external memory pressure. This is a very long post, and it ends with me not knowing exactly what’s going on…but the journey is pretty fun…let’s go!
 
On Windows-based SQL Server systems we’ve become accustomed to the OS signaling to SQL Server that there’s a memory shortage. When signaled, SQL Server will kindly start shrinking it’s memory caches, including the buffer pool, to maintain overall system stability and usability. Well that story is a little different in SQL Server on Linux…last year I wrote a similar post for SQL Server 2017 RTM and how it reacted to external memory pressure, check that out here! That was quite a dramatic story, you can literally cause SQL Serve to swap nearly its entire process address space out to disk! Now, let’s look and see how SQL Server on Linux responds to external memory pressure in SQL Server 2019 CTP2.

The Scenario

Our SQL Server on Linux system is a CentOS 7.5 VM, running SQL Server 2019 (CTP2). There is 12GB of physical memory and 80% of that is dedicated to SQL Server via mssql-conf about 9.2GB, the default configuration. I’ve also set Max Server memory to 8GB for the SQL Server instance.

So let’s start up SQL Server on Linux 2019 CTP2 and query the SQL Server’s cache sizes when there’s no data loaded. I’m going to look in  sys.dm_os_memory_clerks for memory allocation accounting information. In Figures 1 we can get a feel for the memory distribution across the system’s caches. Special shoutout to the new “Azure Data Studio” I’m using in these demos.

Screen Shot 2018 10 13 at 6 46 19 AM

 Figure 1 – Top memory consumers at system startup

Now, I’m going to load an 8GB table into the buffer pool, maxing out SQL Server’s MAX memory instance setting. Now we’ll look at the allocations again. In Figure 2, we can see the buffer pool is the dominant consumer of memory and that our database is the dominant consumer in the buffer pool. So far so good, eh?

Screen Shot 2018 10 13 at 6 50 58 AM

Figure 2 – Top memory consumers after server reached maximum memory

Getting Memory Information From Linux

We can use tools like pstop and htop to look our are virtual and physical memory allocations. We can also look in the /proc virtual file system for our process and look at the status file. In here we’ll find the point in time status of a process, and most importantly the types of memory allocations for a process. We’ll get granular data on the virtual memory allocations and also the resident set size of the process. Here are the interesting values in the status file we’re going to focus on today.

  • VmSize – total current virtual address space of the process
  • VmRSS – total amount of physical memory currently allocated to the process
  • VmSwap – total amount of virtual memory currently paged out to the swap file (disk)

We can use the command pidof sqlservr to find the process ID for our SQL on Linux process, in our case it’s 14689. When running SQL Server on Linux there is always two processes for sqlservr, we’re going to look the one with the higher process ID as the one with the lower PID is the watchdog process.

Now to access the status information for our process we look in /proc/14689/status

Below is the output from our process’ status file, filtering for the values we’re interested in. You can see about 16.4GB of Virtual Memory (VmSize), 7.9GB of which is in actual RAM (VmRSS) and 0MB in the swap file (VmSwap). Wait a second, 16GB of for VmSize? Yea, 16GB for VmSize. That’s the reserved allocation for the whole process. More on that in a second…

VmSize:  17165472 kB

VmRSS:    8579016 kB

VmSwap:         0 kB 


Table 1 – Process memory for SQL Server 2019

Now, if we look back at the previous post here on SQL Server 2017 RTM, the memory distribution was a bit different. We see about 10.16GB of memory in VmSize

VmSize:  10661380 kB

VmRSS:    8201240 kB

VmSwap:         0 kB


Table 2 – Process memory for SQL Server 2017

Phantom Memory Allocation?

Now, I very rarely post something when I don’t have an answer for a specific behavior but this case I don’t. My thoughts on this are, that 16GB is the reserved allocation for the whole process. Linux uses a demand paging allocation system so the majority of the process access space is just a reservation and isn’t backed by actual pages. So it’s not *really* consuming physical memory. But what I do find interesting is the process seems to carry that allocation forward, as I loaded memory in, the VmSize went from 8GB (which is the VmSize when SQL Server Starts) to 16GB (when I loaded the 8GB table). I’ve reported this to the product team and I am awaiting for an answer on the ~8GB discrepancy between the two tests.

Let’s Add Some Pressure

Using a small C program I wrote, I use calloc to allocate large sections of contiguous memory and then I have the program continually write data into the memory buffer to ensure those pages stay in physical memory. Using this program, let’s allocate 7GB of RAM. I’m choosing this value because its going to cause my system to exceed it’s physical memory but NOT exceed the amount virtual memory in my system when including the swap file’s size which is 5.8GB on my system. Things can get really dicey if you exceed physical and virtual memory size, processes will start to get killed. Microsoft documents a case here. Just so you know how important adding a monitor for external memory pressure is, when I allocated 7.5GB and exceeded physical + virtual memory Linux killed both the memory allocation program and SQL Server to protect the stability of the system, oomkiller is a thing to look out for!

Now, let’s look at the memory accounting information at /proc/PID/status for my memory allocator program. In the output below you can see we have just about 7GB of virtual memory (VmSize) allocated and nearly the same amount of memory that’s resident in physical memory (VmRSS) and no data in the swap file.

VmSize:  7344248 kB

VmRSS:   7340312 kB

VmSwap:        0 kB


Table 3 – Process memory for external memory allocation program
 
Now in SQL Server 2017 with that 7GB program running would cause Linux to need to make room in physical memory for this process. Linux does this by swapping least recently used pages from memory out to disk. So under external memory pressure, let’s look at the SQL Server process’ memory allocations according to Linux. In the output below we see we still have a VmSize of around 10GB, but our VmRSS value has decreased dramatically. In fact, our VmRSS is now only 2.95GB. VmSwap has increased to 5.44GB. Wow, that’s a huge portion of the SQL Server process swapped to disk.
 

VmSize:  10700328 kB

VmRSS:    3098456 kB

RssAnon:  3096308 kB

VmData:  10442840 kB

VmSwap:   5711856 kB

Table 4 – Process memory for SQL Server 2017 on Linux under external memory pressure

In SQL Server 2019, there’s a different outcome! In the data below we see our 16GB VmSize which won’t change much because of the virtual address space for the process. With that large external process running SQL Server reduced VmRSS from 7.9GB (from Table 1) to 2.8GB only placing about 4.68GB in the swap file. That doesn’t sound much better, does it? I thought SQL Server was going to react to the external memory pressure…let’s keep digging and ask SQL Server what it thinks about this.

VmSize:  17159552 kB

VmRSS:    2938684 kB

VmData:  16806460 kB

VmSwap:   4911120 kB


Table 5 – Process memory for SQL Server 2019 on Linux under external memory pressure

What’s SQL Server Think About This?

In SQL Server on Linux 2017, we saw a large chunk of the process address space swap out and we saw those pages stay resident in SQL Server’s buffer pool or really any part of the SQL Server process address space since the Linux kernel is an equal opportunity swapper…so anything in the caches managed by SQL Server didn’t know if that memory was resident in physical memory or was actually on disk. In figure 3, we see the top memory consumers after external memory pressure, our buffer pool is still fully populated with our table data.

Screen Shot 2018 10 13 at 8 24 02 AM

Figure 3 – Top memory consumers after external memory pressure in SQL Server 2017

Ok, let’s look at what happens in SQL Server 2019 with external memory pressure. Looking at the memory distribution, we see that the memory clerk for SQL Server’s buffer pool reduced its memory footprint from 7,227MB (the value from way back in Figure 2) when the buffer pool was first loaded with our large table. Wh then applied external memory pressure with our 7GB process and now we see the buffer pool is reduced to 3,502MB a reduction of 3,725MB. Cool, looks like we dumped our buffer pool to protect the system but not the whole buffer pool, there’s still 3,502MB in memory. But the SQL Server 2019 on Linux process still swapped out about that much data (Table 5)…what’s going on? I still don’t know. Let’s keep digging.
 
Screen Shot 2018 10 13 at 8 24 39 AM

Figure 4 – Top memory consumers after external memory pressure in SQL Server 2019

Now, in SQL Server 2017 on Linux Resource Monitor didn’t exist…let’s look at what we have in SQL Server 2019…using the query from Amit Banerjee and Sudarshan Narasimhan in this post here let’s see what happened according to the newly implemented Resource Monitor for SQL Server 2019 on Linux.
 
Screen Shot 2018 10 13 at 10 02 45 AM
 
Figure 5 – Resource Monitor output before and after external memory pressure
 
From the output of this query, we can see that System Indicator is 2 during periods of external memory pressure. This is a signal from Resource Monitor to SQL Server to dump its caches to protect the health of the overall system.  We certainly observed that in the output in Figure 4. But what we also found is that Linux is placing 4.68GB of memory into the swap file and that SQL Server is holding on to 3,502MB of memory in the buffer pool while the VmRSS of the process is 2.8GB (Table 5). So the question remains, why is SQL Server swapping so much of its memory out to disk and holding on to so much memory in the buffer pool even though it’s seeing the low memory signal from Resource Monitor. This data is further backed up by the fact that we see SQL_CommitedMemory_KB staying relatively stable in the light of the external memory pressure, the expectation is that would reduce. So it’s like it’s not quite finishing the job deallocating the memory. So let’s ask Linux what it thinks about this. 

Resource Monitor on Linux – How I think it works…

If you’ve read this blog before you know I like to use strace to find out how applications interact with the OS via system calls. Well, let’s use that technique to try to figure out what happens when SQL Server starts dumping it’s cache under external memory pressure. 
 
In the last section, we learned that Resource Monitor can track high and low memory conditions in Linux. In this section, let’s see how it interacts with the Linux memory management system to deallocate memory.
 
In the output below (Table 6) you can see the aggregated system calls when I’m applying external memory pressure to SQL Server on Linux. From that output, it looks like SQL Server on Linux uses memory mapped files, which is a file access technique where files are opened mapped directly into a process’ memory address space. The process can then access the file via direct memory addressing rather than using system calls.  This type of access actually makes quite well to the internal file structures used in SQL Server data files.
 
During external memory pressure, the madvise is the most frequently called system call during this period. This system call is used by applications to advise (hint) the kernel on what to do with ranges of memory under certain conditions. In the last section, we observed that Resource Monitor can sense the external memory pressure and signals SQL Server to clear its caches and this is likely the implementation of how those caches are dumped.  
 
Under memory pressure, the madvise system call is called with the MADV_DONTNEED flag for large address ranges in SQL Server’s process address space, the memory mapped file. The MADV_DONTNEED flag tells the kernel that the application does not expect to access these pages in the near future and they can be discarded and eventually freed rather than swapped out to disk. See the kernel source code here for how it works.

madvise(0x69274a000, 8192, MADV_DONTNEED) = 0

 
My theory is that this is how Resource Monitor is signaling to the OS that these pages are no longer needed and should be deallocated due to the semantics of the madvise system call the pages are marked as not needed. So we see the reduction in the size reported by SQL Server’s memory clerks. But what I find interesting is the amount of memory that’s still hitting swap when we look at the memory accounting information for the individual processes and the IO profile of the system during external memory pressure. So it seems like the kernel or the application is holding onto pages and they’re hitting swap rather than being deallocated.
 
This is just my theory and I’m trying to bring the pieces together in this emerging technology since it’s still in CTP. So if you have some insight as to what’s happening at this layer, I’d love to hear from you :) 
 

strace -p PID -f -c

% time     seconds  usecs/call     calls    errors syscall

—— ———– ———– ——— ——— —————-

 75.98  575.483472       12188     47217     11425 futex

  8.24   62.370591      275976       226           epoll_wait

  7.73   58.578205     2253008        26        14 restart_syscall

  3.44   26.050226     2170852        12           io_getevents

  1.60   12.096475      549840        22           nanosleep

  1.55   11.726022      266501        44        44 rt_sigtimedwait

  0.71    5.387356          14    389517           clock_gettime

  0.49    3.688034          24    154405           madvise

  0.12    0.944458         803      1176           munmap

  0.07    0.495154         343      1445           mmap

  0.06    0.444399        7796        57           open

  0.01    0.062301        1093        57           read

  0.00    0.013327         579        23        23 stat

  0.00    0.009003          46       196           gettid

  0.00    0.005058          43       119           sched_setaffinity

  0.00    0.004188          68        62           clone

  0.00    0.004018          65        62           set_robust_list

  0.00    0.003902          36       109           rt_sigprocmask

  0.00    0.003194          56        57           fstat

  0.00    0.002914          47        62           sched_getaffinity

  0.00    0.002331          38        62           sigaltstack

  0.00    0.001805          29        62           arch_prctl

  0.00    0.001575          28        57           close

  0.00    0.001182         394         3           io_submit

  0.00    0.000672          42        16           sched_yield

  0.00    0.000506          22        23           rt_sigaction

  0.00    0.000011          11         1           fdatasync

—— ———– ———– ——— ——— —————-

100.00  757.380379                595118     11506 total

 
Table 6: Aggregated system calls for SQL Server on Linux under external memory pressure
 

 % time     seconds  usecs/call     calls    errors syscall

—— ———– ———– ——— ——— —————-

 71.52  569.206916       33673     16904      4444 futex

 12.46   99.164048      819538       121           epoll_wait

  6.90   54.920907     2387866        23        17 restart_syscall

  2.73   21.715336        1084     20033           io_getevents

  2.15   17.123567     1902619         9           nanosleep

  2.07   16.494124      868112        19        19 rt_sigtimedwait

  1.77   14.122895        1070     13202           io_submit

  0.22    1.780524         218      8180           munmap

  0.10    0.769185          93      8262           mmap

  0.06    0.502749         125      4026           clock_gettime

  0.00    0.006171         386        16           sched_setaffinity

  0.00    0.005702          62        92           read

  0.00    0.004128         197        21           fstat

  0.00    0.003996         118        34           sched_yield

  0.00    0.003961         165        24           gettid

  0.00    0.003112         346         9           arch_prctl

  0.00    0.002615         291         9           sched_getaffinity

  0.00    0.002528          84        30           rt_sigprocmask

  0.00    0.002045          85        24           close

  0.00    0.001929          92        21           open

  0.00    0.001924         214         9           clone

  0.00    0.001752         195         9           set_robust_list

  0.00    0.001647          97        17           madvise

  0.00    0.001198         599         2           fdatasync

  0.00    0.000989         110         9           sigaltstack

  0.00    0.000632          57        11        11 stat

  0.00    0.000446          45        10           rt_sigaction

  0.00    0.000351          18        19           write

  0.00    0.000316          11        29           epoll_ctl

  0.00    0.000310          44         7           writev

  0.00    0.000159          11        15         7 readv

  0.00    0.000051          26         2           socket

  0.00    0.000021          11         2           setsockopt

  0.00    0.000014           4         4           getcpu

  0.00    0.000009           3         3           getsockname

  0.00    0.000009           5         2           getpeername

  0.00    0.000004           4         1           accept

  0.00    0.000004           2         2           fcntl

—— ———– ———– ——— ——— —————-

100.00  795.846274                 71212      4498 total

 
Table 7: Aggregated system calls for SQL Server on Linux under external memory pressure
 
 

My Experiences Tech Editing Pro SQL Server on Linux

The Opportunity

Earlier this year I received an email from Jonathan Gennick, an editor at Apress books. The subject of the email was “Tech edit a book” and he asked if I was familiar with SQL Server on Linux. I excitedly replied “yes…I think I’m your guy for this”. In 2017, I had the opportunity to tech edit another book on SQL Server on Linux and had to back out of that project and this was a fantastic second chance to still be able to work on a topic I really want to participate in and help move forward. I replied to Jonathan and asked who the author is…he replied…the author is Bob Ward.

The Author

If you’re active in the SQL Server community you know Bob. Ever present at the major conferences and Principal Architect for on the SQL Server Product Team. Bob has a unique combination of world-class engineering skills and the ability to communicate extremely complex technical details in simple, approachable terms. My first experience watching Bob speak was at SQL Intersection 2014 Fall edition in Las Vegas. At this conference, I saw one of his legendary 3-hour sessions on SQL Server internals. He delivered a session on SQL Server’s memory management system. I remember being amazed by how he took a very complex topic and made it very approachable and understandable to the audience, I even tweeted about it here

The Book

Knowing what an exceptional communicator Bob is and having the opportunity to tech edit a book written by him, this was a no-brainer. I wanted to do this project. We kicked off the project and the first chapter was delivered its title is “Why SQL Server on Linux”. This chapter alone makes the book worth every penny. In this chapter, you’ll learn about the journey of putting SQL Server on Linux. You get first-hand anecdotal stories straight from Bob’s experiences getting SQL Server working on Linux. I remember reading that first chapter and thinking, this is a stellar intro to the book, I literally don’t want to stop reading.

In each chapter, Bob brings his years of experience to help tell the reader what they really need to know about SQL Server on Linux, how it works and how to have a successful SQL Server on Linux deployment. And throughout the book, Bob brings historical anecdotes describing how the various facets of SQL Server got to where they are today. And he also calls out the work of the PMs and members of the product team for the features or innovations they’re responsible for as they’re discussed in the chapters of the book. Bob even takes time to give credit to various members of the SQL Server community and their contributions. This book is as much about the people and journey, as the technology being discussed. 

Getting the Book

The book is still in the final stages of the publishing process and it’s currently available for pre-order online at Apress’ site and also on Amazon. Oh, and at the time of me writing this post, the book is currently number 1 in the “Linux Servers” new releases category and number 18 in the overall “Linux” new releases category on Amazon! Congrats Bob on the book and the successful release!

Bob wrote a blog post about his experiences writing the book, check it out here!

Pro SQL Server on Linux

I’m Speaking at SQLSaturday Cambridge!

Speaking at SQLSaturday Cambridge!

I’m proud to announce that I will be speaking at SQL Saturday Cambridge on September 8th 2018! And wow, 748 SQL Saturdays! This one won’t let you down. Check out the amazing schedule of International Experts and Microsoft MVPs!

If you don’t know what SQLSaturday is, it’s a whole day of free SQL Server training available to you at no cost!

If you haven’t been to a SQLSaturday, what are you waiting for! Sign up now!

SQLSaturday #748 - Cambridge 2018

This year I have TWO sessions!

1. Monitoring Linux Performance for the SQL Server Admin

So you’re a SQL Server administrator and you just installed SQL Server on Linux. It’s a whole new world. Don’t fear, it’s just an operating system. It has all the same components Windows has and in this session we’ll show you that. We will look at the Linux operating system architecture and show you where to look for the performance data you’re used to! Further we’ll dive into SQLPAL and how it architecture and internals enables high performance for your SQL Server. By the end of this session you’ll be ready to go back to the office and have a solid understanding of performance monitoring Linux systems and SQL on Linux. We’ll look at the core system components of CPU, Disk, and Memory and monitoring techniques for each.

2. Containers – You Better Get on Board

Containers are taking over, changing the way systems are developed and deployed…and that’s NOT hyperbole. Just imagine if you could deploy SQL Server or even your whole application stack in just minutes. You can do that, leveraging containers! In this session, we’ll get your started on your container journey learning container fundamentals in Docker, then look at some common container scenarios and introduce deployment automation with Kubernetes. In this session we’ll look at Container Fundamentals with Docker Common Container Scenarios Automation with Kubernetes.

Questions from PASS Marathon Containers

Thanks to everyone who attended the PASS Marathon Containers edition and to PASS for the opportunity to present. I received the Questions from the session and wanted to provide answers to the attendees and the community.
 
If you want to see the session again, check it out on YouTube. The decks are available online at http://www.centinosystems.com/blog/talks/
 
Here’s the list of questions from the session and my answers.
  • What do you mean it is not for production environment in Windows?
    • It’s my understanding that only Linux based SQL Server containers are supported and that Windows based containers are not. I’m looking to find an official statement, like a web site link) from Microsoft on this but I am having troubles doing so. Here is the official statement on running SQL Server on Linux in a Container – https://bit.ly/2LYPeKh

  • When you say App1 on a container, is it just 1 executable/service or can be multiple of those on the same container?
    • Generally speaking you’ll want only one process in a container. A primary reason for using containers is agility and a core way of achieving that is breaking dependencies by reducing what’s included inside the container.. Technically speaking, you can have more than one process inside a container. If fact SQL Server on Linux does. There’s the Watchdog process, then the actual SQL Server process. The output below is a process listing from inside a running SQL Server on Linux container. You can see PID 1 and 7 are processes inside the container.

      root 1   /opt/mssql/bin/sqlservr

      root 7   /opt/mssql/bin/sqlservr

       
      For the internals geeks out there, let’s look a a process listing on the host OS that’s running our container. From there we can see that the sqlservr process is a child process of containerd which is managed by dockerd. This is the same SQL Server process inside the container. But in the first example you here can see the impact of namespaces…the process IDs are rebased and start at 1 and the second SQL Server PID is 9. In the output below you can see the PIDs are 2172 and 2213.
       

      root 1034 /usr/bin/dockerd

      root 1245 \_ docker-containerd 

      root 2154     \_ docker-containerd-shim -namespace moby -workdir 

      root 2172         \_ /opt/mssql/bin/sqlservr

      root 2213             \_ /opt/mssql/bin/sqlservr

       
  • Maybe I missed this part, how do I know what kind of image I could pull down?
    • In the demos I show how to use docker search to find images that are available from the Docker Hub. If you prefer a web browser experience, check out the Docker Hub to see what containers are available to you. Here’s the code to find the mysql-server images available in Docker Hub.
      • docker search mssql-server | sort
         
  • Does SQL Container fit into production environment?
    • Here is a link to the official word from Microsoft on running containers in production – https://bit.ly/2LYPeKh
    • What I want you to leave this session with is an introduction to containers, starting your journey on what’s next when using containers. To that end here are some of the things you’ll need to consider before using containers in production
      • Is your organization ready – Do the operational skills and technologies exist to support using containers in production.
      • Backup and recovery – Does the organization have a strong backup and recovery environment. How are you going to protect the data running in a SQL Server container. Luckily, it’s just SQL Server on Linux so you can use the traditional technologies and techniques to backup your data. 
      • Data persistency – Understanding the underlying physical infrastructure and how to persistent data in ways that it’s protected and well performing.
      • Orchestration – Is there technologies in place to manage the state of your containers, things like workload placement, starting, stopping and also data persistency.
         
  • How do SQL Containers work with High Availability and Disaster Recovery?
    • Backups and data persistency are primary concerns here. You still need to care and feed for your SQL Server databases just as if they were platformed on a full operating system. For HA, Microsoft has some guidance on how to use Kubernetes to provide HA services to your SQL Server containers here. What I want you to think about when using containers for SQL Server is deploying a new container is VERY fast. We want to be able to persist the data and be able to stand up a new container and mount our data inside that container. Using this technique we can restore SQL Services very quickly with low RTO. That itself is an interesting way to provide HA services without any additional technologies.
       
  • Is there a way to have persistent storage for the system databases (e.g. master database for logins and what not)?
    • In the demos during the session I defined a Docker Data Volume when we started the container where we mounted that as /var/opt/mssql/ inside the container.  When SQL Server on Linux starts for the first time it will copy the system databases from its package directories into /var/opt/mssql/data. Since this data is stored in the persistent data volume if we stop and delete this container and start a new container pointing at that same docker data volume when SQL Server starts up it will use those system databases.

      Starting a SQL Server Container with a Docker Data Volume. The -v parameter names the volume sqldata1 and /var/opt/mssql is where it will be mounted inside the container.

  • How about the backup of a container? can it be like VM’s snapshot? 
    • You can snapshot the state of a container with docker commit. This will create a new image from the container and that image can be used to create additional containers. But recall, containers are intended to be ephemeral, we really want to define the state of the container OUTSIDE of the container in code. The things inside the container that require data persistency, like databases should be taken care of using  techniques like Docker Data Volume, backups and other high availability scenarios.

 

Speaking at SQLSaturday Sacramento – 757!

Speaking at SQLSaturday Sacramento!

I’m proud to announce that I will be speaking at SQL Saturday Sacramento on July 28th 2018! And wow, 757 SQL Saturdays! This one won’t let you down. Check out the amazing schedule!

If you don’t know what SQLSaturday is, it’s a whole day of free SQL Server training available to you at no cost!

If you haven’t been to a SQLSaturday, what are you waiting for! Sign up now!

SQLSaturday #757 - Sacramento 2018

This year I have TWO sessions!

1. Monitoring Linux Performance for the SQL Server Admin

So you’re a SQL Server administrator and you just installed SQL Server on Linux. It’s a whole new world. Don’t fear, it’s just an operating system. It has all the same components Windows has and in this session we’ll show you that. We will look at the Linux operating system architecture and show you where to look for the performance data you’re used to! Further we’ll dive into SQLPAL and how it architecture and internals enables high performance for your SQL Server. By the end of this session you’ll be ready to go back to the office and have a solid understanding of performance monitoring Linux systems and SQL on Linux. We’ll look at the core system components of CPU, Disk, Memory and Networking monitoring techniques for each and look some of the new tools available.

2. Containers – You Better Get on Board

Containers are taking over, changing the way systems are developed and deployed…and that’s NOT hyperbole. Just imagine if you could deploy SQL Server or even your whole application stack in just minutes. You can do that, leveraging containers! In this session, we’ll get your started on your container journey learning container fundamentals in Docker, then look at some common container scenarios and introduce deployment automation with Kubernetes. In this session we’ll look at Container Fundamentals with Docker Common, Container Scenarios and Orchestration with Kubernetes

Microsoft Most Valuable Professional – Data Platform for 2018-2019

Today, I’m proud to announce that I have been renewed as an Microsoft MVP – Data Platform for the 2018-2019 award year, my second MVP award. This is an truly an honor and I’m humbled to be included in this group of exceptional data professionals. I really look forward to continuing to work with everyone in the MVP community and continuing to contribute to our unmatched SQL Community!

MVP Logo Horizontal Secondary Blue286 CMYK 300ppi

What is an MVP?

Here’s the definition according to Microsoft

Microsoft Most Valuable Professionals, or MVPs, are technology experts who passionately share their knowledge with the community. They are always on the “bleeding edge” and have an unstoppable urge to get their hands on new, exciting technologies. They have very deep knowledge of Microsoft products and services, while also being able to bring together diverse platforms, products and solutions, to solve real world problems. MVPs make up a global community of over 4,000 technical experts and community leaders across 90 countries and are driven by their passion, community spirit, and quest for knowledge. Above all and in addition to their amazing technical abilities, MVPs are always willing to help others – that’s what sets them apart.

For 2018-2019, I have been named a Data Platform MVP, which means my technical specialization is on data products like SQL Server. The group of people that have received this award is quite small…by my count 367 worldwide and less than 100 in the US. I’m honored to be in this group of talented professionals.

Why I’m excited to be an MVP?

Honestly, the primary reason I’m excited to be an MVP is to give back (more), I’ve learned so much from other MVPs and receiving this award will help me to continue to build relationships with other MVPs and Microsoft employees to further help develop the Data Platform itself and the community that surrounds that platform.

At the start of 2016 I had set a goal of being an MVP in 5 years. I don’t know why I picked that number, but what I figured was…MVP would be validation of consistent, quality work for our community and being recognized for the work that I’ve contributed. Things like blogging, social media, public speaking and more. You learn a ton by teaching! 

To the people that have helped along the way, I thank you all!

Attempting to Run SQL on Linux Inside Windows Subsystem for Linux

Shawn Melton MVP and dbatools contributor last week had an issue running SQL Server on Linux inside of Windows Subsystem for Linux.

I didn’t want to leave a brother hanging so I spent this morning digging into this a little bit. 

Reproducing the Issue

The first thing I had to do was reproduce the issue. So on my Windows 10 test VM I installed the Windows Subsystem for Linux, steps to do so are here and I installed the Ubuntu app.

Then, I fired up a bash shell using WSL and then I installed SQL Server on Linux for Ubuntu as documented here

Now, I completed the installation of SQL Server on Linux using mssql-conf when that program completes it attempts to start SQL Server on Linux. BOOM! I’m able to reproduce the same error.

Looking at the error, I decided to see if I could run SQL Server on Linux from the shell as the user mssql. This would remove systemd and mssql-conf from the picture. Basically I wanted to see if I could get another, more descriptive, error to pop out. To do that we’ll need to change over to the mssql user with su.

And then change into the working directory for SQL Server on Linux and try to launch SQL Server.

Now, doing that…generates same same error! Here’s the error in a search engine friendly form :)

Digging a Little Deeper

So now with the same error output, I decided to give it a cursory pass with strace to see if I could find anything that would put us closer to why SQL Server on Linux won’t start when using Windows Subsystem for Linux.

What you see in the strace output is the parent process creating the child sqlservr process and failing. In the first line of output you can see process 137 clone and return process ID 139. Which is how a parent process creates a child in Linux. Then process 139 tries to perform some setup operations like registering signal actions (rt_sigaction) and their corresponding routines to call when that signal is received by that process.

Now the only error I found in the output is the prctl call which returns invalid argument.This system call is to perform operations on a process.  On my WSL system the option being set PR_SET_PTRACER is for the Yama LSM subsystem which lives in /proc/sys/kernel/yama normally. This doesn’t exist on my Ubuntu WSL installation. I checked my CentOS full VMs and this exists. I checked a full Ubuntu installation and it’s there too.

After the error SQL Server calls tgkill and kills itself with the SIGABRT signal. A dump occurs and the program exits. 

What’s Really Happening?

Well I think something is missing from Windows Subsystem for Linux. Is it the Yama stuff…perhaps. But clearly SQL Server isn’t happy with the environment and kills itself. I haven’t dove into WSL yet and I don’t know how it’s implemented, but there could also be something up at that level too. Generally I don’t write blog posts where I don’t know exactly what’s going on, but I did want to let folks know that SQL on Linux doesn’t work on Windows Subsystem for Linux.