Tag Archives: Linux

Instant File Initialization in SQL Server on Linux

Earlier this week Ned Otter (@NedOtter) brought up a question about Instant File Initialization on SQL Server on Linux, check out the thread here. I was up way too early in the morning, as I normally am, so I decided to poke around and see how it was done. SQL Server pros, here you can see you can get some deep internal information from the OS very easily. Hopefully with this blog post you’ll be able to compare how this is done on Windows and draw the connections between the two platforms.

Let’s check it out…

SQL on Linux Internals

First, the internals of SQL Server on Linux leverage a process virtualization technique called SQLPAL. Inside SQLPAL, is a Win32 environment custom tailored to support SQL Server. Next, SQLPAL needs a way to talk to Linux so that it can access the physical resources of the system and it does this via something called the Host Extensions. Essentially the HE map SQLPAL’s Win32 and SQLOSv2 API calls to Linux system calls. In Linux system calls provide access to system resources, things like CPU, memory and network or disk I/O. And that’s the flow, SQLPAL, calls the Host Extensions, which call systems calls to interact with system resources.

Using strace to Collect System Calls

Knowing this, we can leverage Linux tools to see what SQL Server is doing when it interacts with the operating system. Specifically we can leverage tools like strace to see which systems calls it uses to perform certain tasks and in this case we’re going to look at how SQL on Linux implements Instance File Initialization.

It’s needless to say, do not reproduce this on a system that’s important to you

Attach strace to your currently running SQL Server process. Let’s find our SQL Server process

[root@server2 ~]# ps -aux | grep sqlservr

mssql      1414  3.0  0.4 198156 18244 ?        Ssl  06:23   0:04 /opt/mssql/bin/sqlservr

mssql      1416  6.3 15.9 1950768 616652 ?      Sl   06:23   0:08 /opt/mssql/bin/sqlservr

 

strace -t -f -p 1416 -o new_database.txt

Let’s walk through the strace parameters here, -t adds a time stamp, -f will attach strace to any threads forked from our traced process and -p is the process we want to trace .

Creating a Database with Instant File Initialization

With strace up and running let’s turn on the trace flags to enable output for Instant File Initialization and create database that has a 100MB data file and a 100MB log file. Check out this post from Microsoft for more details on the trace flags. This database create code is straight from their post. I changed the model database’s data and log file sizes to 100MB each. Also, it’s important to note Instance File Initialization is only for data files, log files are zeroed out due to requirements for crash recovery. We’re going to see that in action in a bit…
 

DBCC TRACEON(3004,3605,-1)

GO

 

CREATE DATABASE TestFileZero

GO

 

EXEC sp_readerrorlog

GO

 

DROP DATABASE TestFileZero

GO

 

DBCC TRACEOFF(3004,3605,-1)

 
Once the database creation is done, stop your strace and let’s go and check out the data gathered in the file. 

Poking Around in Linux Internals, Creating the MDF

Inside your output file you’re going to see a collection of system calls. In Linux a system call is the way a user space program can ask the kernel to do some work. And in this case SQL Server on Linux is asking the kernel to create a data file, create a log file and zero out the log file. So let’s check out what happens when the MDF is created.
 
1630  09:03:28.393592 open(“/var/opt/mssql/data/TestFileZero.mdf”, O_RDWR|O_CREAT|O_EXCL|O_DIRECT, 0660) = 154

First thing, that 1630, that’s the process ID of the thread that’s doing the work here. That PID is different than the one we attached strace to because it’s a thread created when we execute our database create statements.
 
Next you see a call to open, it’s opening the file TestFileZero.mdf. The next parameter are flags to tell open what to do, in this case O_RDWR opens the file for read/write access, O_CREAT creates a file, O_EXCL prevents open from overwriting the file being created, and O_DIRECT enables synchronous I/O to the file and disables the file system cache, 0660 is the file mode and the returne value is 164…this is the file descriptor for the file created. A file descriptor (fd) is used to represent and provide access to the file that was just opened. We’ll pass this to other system calls so they can interact with the file addressed by the fd.

1630  09:03:29.087471 fallocate(154, 0, 0, 104857600 <unfinished …> = 0

1630  09:03:29.087579 <… fallocate resumed> ) = 0


Next, we see a call to fallocate on the file descriptor 154, the first 0 is the mode, which tells fallocate to allocate disk space within the range specified in the third and forth parameters, offset and length respectively. And here is from 0 to 100MB. If you read the man page for fallocate, there is a FALLOC_FL_ZERO_RANGE flag that can be passed into the mode parameter. In this call, mode is set to 0 so this flag is not set. Further, the man page indicates that this flag is supported on XFS in Linux kernel 3.15. I’m on 3.10 using CentOS 7.4. So there’s no zeroing magic happing at the file system level.

1630  09:03:29.087603 ftruncate(154, 104857600) = 0


Next, there’s a call to ftruncate on fd 154. This call sets the length of the file to the parameter passed in this case 100MB.

1630  09:03:29.091223 io_submit(140030075985920, 1, [{data=0x185b4dc48, pwrite, fildes=154, str=”\1\v\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0c\0\0\0\2\0\374\37″…, nbytes=8192, offset=8192}]) = 1


Nex,t we’ll see a sequence of calls using io_submit, which submits asynchronous I/O to disk. The parameters for this one are an aio_context_t, the number of blocks in the request, then an array of AIO control blocks. The AIO control blocks are what’s inside the brackets [], the key parameters are pwrite  and filedes. That’s a write operation and the file descriptor which matches the fd that we’ve been working with, 154. These 17 write operations are laying out the mdf file, which has a header and other metadata through the file, but it’s certainly not zeroing the file out. We’ll see how zeroing works when we get to the LDF.
 

1630  09:03:29.098452 fsync(154 <unfinished …>

1630  09:03:29.098640 <… fsync resumed> ) = 0


The fsync call instructs the file descriptor to flush all buffers to disk. 
 

1630  09:03:29.099260 close(154)        = 0


The close call closes file file descriptor, releases any locks and allows the file to be reused by other processes. 

Poking Around in Linux Internals, Creating and Zeroing the LDF

So that was the creation of the data file, now let’s check out how the transaction log file is created. Now Instant File Initialization only applies to data files, transaction log files must be zeroed out for crash recovery. Let’s dig in. 
 

1630  09:03:29.831413 open(“/var/opt/mssql/data/TestFileZero_log.ldf”, O_RDWR|O_CREAT|O_EXCL|O_DIRECT, 0660) = 154

 
We see an open call again, nearly identical, and the file descriptor returned is again 154. 

1630  09:03:30.395757 fallocate(154, 0, 0, 104857600) = 0

 
There’s a call to fallocate to provision the underlying storage.

1630  09:03:30.395814 ftruncate(154, 104857600) = 0


Then we see a call to truncate again to ensure the size of the file is 100MB.

1630  09:03:30.396466 fsync(154 <unfinished …>

1630  09:03:30.397672 <… fsync resumed> ) = 0


Then there’s a call to fsync again, flushing buffers to disk.
 

1630  09:03:30.400042 write(1, “Z”, 1)  = 1

1630  09:03:30.400088 write(1, “e”, 1)  = 1

1630  09:03:30.400134 write(1, “r”, 1)  = 1

1630  09:03:30.400180 write(1, “o”, 1)  = 1

1630  09:03:30.400246 write(1, “i”, 1)  = 1

1630  09:03:30.400301 write(1, “n”, 1)  = 1

1630  09:03:30.400348 write(1, “g”, 1)  = 1

…output omitted


Now things get special…we see a series of write calls. This write call isn’t writing to the file…it’s writing to standard out, as indicated by the first parameter which is 1. 1 is the file descriptor for standard out.  The second parameter is the data to be written out, in this case you can see it’s a single character, the third parameter is the size of the data being written. The return value, that’s the last 1 on the line, that’s the number of bytes written to the file. And guess where this data is being sent too…the SQL Server Error log! See the string “Zeroing”?
Zeroing transaction log start
 

1630  09:03:30.406250 io_submit(140030075985920, 1, [{data=0x185b62308, pwritev, fildes=154, iovec=[{“\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300″…, 4096}…], offset=0}]) = 1

…Output omitted

1630  09:03:30.454831 io_submit(140030075985920, 1, [{data=0x185b4dc48, pwritev, fildes=154, iovec=[{“\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300\300″…, 4096}…], offset=100663296}]) = 1

 
Now for the log file, we’re going to see io_submit calls again, this time MANY more, from the output above I show the starting offset = 0 and an ending offset = 100663296, there’s several in between. If I did the math correctly, each io_submit writes 4MB of data, this last io_submit call is starting offset is at 96MB, plus that IO then we have our zeroed 100MB log file.
 
The difference in the time stamps between the first call and the start of the last call 48.581ms. strace’s time stamps are in microseconds. 
 

1630  09:03:30.460172 write(1, “Z”, 1)  = 1

1630  09:03:30.460215 write(1, “e”, 1)  = 1

1630  09:03:30.460259 write(1, “r”, 1)  = 1

1630  09:03:30.460315 write(1, “o”, 1)  = 1

1630  09:03:30.460361 write(1, “i”, 1)  = 1

1630  09:03:30.460406 write(1, “n”, 1)  = 1

1630  09:03:30.460450 write(1, “g”, 1)  = 1


When the io_submit calls are finished, we see a series of writes to standard out and the same data is in the SQL Error log and we see that zeroing is finished and it took 60ms…very close to what was reported by strace’s time stamps! There are additional calls from the printing of the start zeroing message to the stop zeroing message that are not included in my output here, plus the time that last IO takes to complete. 
 
Zeroring transaction log finish

1630  09:03:30.469088 fsync(154 <unfinished …>

1630  09:03:30.469532 <… fsync resumed> ) = 0


We call fsync to flush our buffers for this file. 

1630  09:03:30.469977 close(154)        = 0

 
Then we close the file.
 
In your strace data, you’ll see another sequence of zeroing in the file for zeroing the tail of the log, here’s the output from the SQL Error Log.

Zeroring tail of the log

And there we have it, this is how Instant File Initialization works on Linux and some Linux kernel internals!

TechMentor Dine Around

Next week I’ll be speaking at TechMentor in Redmond, I’m doing a 1/2 day workshop on Linux OS Fundamentals for the Windows Admin be sure to come see me!

If you’re there on Monday night (August 7th) and let’s get together for dinner! We’ll talk tech and hopefully make a few new friends and networking connections!

Where – We’ll start at the Hyatt Regency Bellevue, light appetizers will be provided. Then we’ll head on over to Lot No. 3 for dinner. I’ll pick up the first round of drinks and the appetizers!

When – Monday 6:30PM at Hyatt. Dinner at 7:30PM at Lot No. 3

What – The purpose of this event is so that conference attendees and speakers and get together and have a lively conversation about whatever they see fit.

Discussion Topics for our group – Microsoft and Linux! Things like PowerShell on Linux, SQL Server on Linux, Windows Services for Linux and more. In addition to me, we’re going to have a special guest join us for dinner, a world renowned SQL Server expert! You’ll have to come to the dinner to find out who.

I’m going to limit our dinner to 8 attendees, please email me at aen@centinosystems.com if you’re interested in attending! There are other events that night so please feel free to come to the hotel and find a group to meet up with.

 

TechMentor

Speaking at TechMentor – Redmond

I’ll be speaking at TechMentor, August 7-11 at Microsoft HQ in Redmond. Surrounded by your fellow IT professionals, TechMentor provides you with in-depth, immediately usable training that will keep you relevant in the workforce. 

I’ll be presenting the following session: Workshop: Linux OS Fundamentals for the Windows Admin – this will be a fun session getting you started on your road to Linux proficiency, we’ll install Linux together and work through things like command line syntax, building complex commands with pipeline, performance basics and package management.

Amplify your knowledge at TechMentor Redmond — bring the issues that keep you up at night and prepare to leave this event with the answers, guidance and training you need.  

Register now: http://bit.ly/TMSPK17Reg 

SPECIAL OFFER: As a speaker, I can extend $500 savings on the 5-day package. Register here: http://bit.ly/TMSPK17Home

 

TechMentor 2017

Speaking at PASS Summit 2017

I’m very pleased to announce that I will be speaking at PASS Summit 2017!  This is my first time speaking at PASS Summit and I’m very excited to be doing so! What’s more, is I get to help blaze new ground on a emerging technology SQL Server on Linux! My session is Monitoring Linux Performance for the SQL Server Admin so if you’re a Windows or SQL Server administrator, this session is for you. We’ll look at some of the internals of SQL Server on Linux and dive into Linux OS internals and show you where to look inside Linux for most important performance data for your SQL Server. I hope to see you there!

 

Monitoring Linux Performance for the SQL Server Admin

Abstract

So you’re a SQL Server administrator and you just installed SQL Server on Linux. It’s a whole new world. Don’t fear, it’s just an operating system. It has all the same components Windows has and in this session we’ll show you that. We will look at the Linux operating system architecture and show you where to look for the performance data you’re used to! Further we’ll dive into SQLPAL and how it architecture and internals enables high performance for your SQL Server. By the end of this session you’ll be ready to go back to the office and have a solid understanding of performance monitoring Linux systems and SQL on Linux. We’ll look at the core system components of CPU, Disk, Memory and Networking monitoring techniques for each and look some of the new tools available including new DMVs and DBFS.

 

PASS Summit 2017

Reflecting on the Last Year of Microsoft’s OpenSource Technologies

This past year has certainly been interesting in the world of Linux. Microsoft has taken a new strategy and is embracing the open source model. It’s releasing it’s key software products with versions for Linux. It’s truly a remarkable time. In this post I want to highlight some of the bigger events and cover what does this mean to you and where you can go do get some training on these topics.

Here’s some of the highlights from the last year

Microsoft becomes a Platinum Member in the Linux Foundation – this means Microsoft is committing itself to a long term investment in the Open Source community and continuing to develop open source software. Don’t believe me on the Open Source thing…well check out their GitHub repo. Who would have seen this coming? 

Now, let’s Look at the new tools you have to build cross platform applications and develop your systems

  • .NET Core – Literally you can build native .NET applications to run on any platform, Windows, Linux, Mac…Docker!
  • bash Ubuntu on Windows – One of the primary reasons I bought my first Mac years ago was I wanted a bash shell, well now I’m not tied to this hardware anymore. 
  • Visual Studio Code – With all this cross platform stuff, you’ll need a consistent development environment, VS Code runs on Windows, Linux and Mac. And it’s darn nice too. Very extensible with many languages available. 
  • SQL Server on Linux – This is the real deal, it’s fast and consistent with your existing SQL Server experience. I’ve blogged about it a bit :)
  • PowerShell Core – Microsoft adds another management tool to your tool belt with this. Windows, Linux and Mac…can be managed all with one Language. For me, this was mind blowing, I got to do a training video with literally the inventor of PowerShell Jeffrey Snover and MVP Jason Helmick! I blogged about PowerShell a bit too.
NewImage

What does this mean to you?

So what’s this mean to you? Get out there and start learning about this stuff and discover how it can impact you. In the coming years new solutions are going to be developed using these components and it’s upon you to train yourself and learn how to leverage these tools to solve problems. 

I’ve spent the last year developing some fun training at Pluralsight I think you should check out. The training is based on the Linux Foundation Certified Engineer curriculum and takes you from installation up to a running Linux system. 

  • Understanding and Using Essential Tools for Enterprise Linux 7 – If you’re new to Linux, start here! This will course will help you install Linux and get oriented with the operating system and the command line interface. 
  • LFCE: Advanced Network and System Administration – Next, you’ll need to learn how to control your system’s services, install packages, manage performance and share data between systems. Check this course out to make your Linux system really work for you
  • LFCE: Advanced Linux Networking – Your systems don’t stand alone, in this course you’ll dive deep into how data moves between Linux systems. Protip, these concepts apply to Windows systems too.
  • LCFE: Network and Host Security – My newest course, let’s learn how to secure our Linux systems from both the networking and host perspective. We’ll cover security concepts and architectures, securing Linux services and take a deep dive into OpenSSH and remote access.
  • LFCE: Linux Service Management – HTTP Services – I’m currently developing a course on HTTP Services – you’ll learn how to install, configure and manage Apache.
  • More to follow – announcements coming up soon! I can’t wait to tell you what’s next.

I’ve got tons of blog posts on these topics, 

So go ahead, get digging in there learn download Linux (yes, I prefer CentOS), install SQL Server and PowerShell and start moving your skills towards where the technology is going to take you!

Speaking at SQLSaturday Sacramento – 650!

Speaking at SQLSaturday Sacramento!

I’m proud to announce that I will be speaking at SQL Saturday Sacramento on July 15th 2017! And wow, 650 SQLSaturdays! This one won’t let you down. Check out the amazing schedule!

If you don’t know what SQLSaturday is, it’s a whole day of free SQL Server training available to you at no cost!

If you haven’t been to a SQLSaturday, what are you waiting for! Sign up now!

SQLSATSAC650

This year I have TWO sessions!

1. Linux OS Fundamentals for the SQL Admin

SQL Server and PowerShell are now available on Linux and management wants you to leverage this shift in technology to more effectively manage your systems, but you’re a Windows admin!  Don’t fear! It’s just an operating system! It has all the same components Windows has and in this session we’ll show you that. We will look at the Linux operating system architecture and show you how to interact with and manage Linux system. By the end of this session you’ll be ready to go back to the office and get started working with Linux with a fundamental understanding of how it works.

2. Designing High Availability Database Systems using AlwaysOn Availability Groups

Are you looking for a high availability solution for your business critical application? You’re heard about AlwaysOn Availability Groups and they seem like a good solution, but you don’t know where to start. It all starts with a solid design. In this session we introduce the core concepts needed to design a Availability Group based system. Covering topics such as recovery objectives, replica placement, failover requirements, synchronization models, quorum, backup and recovery and monitoring. This session is modeled after real world client engagements conducted by Centino Systems that have lead to many successful Availability Groups based systems supporting tier 1 business critical applications.

dbfs – command line access to SQL Server DMVs

With SQL Server on Linux, Microsoft has recognized that they’re opening up their products to a new set of users. People that aren’t used to Windows and it’s tools. In the Linux world we have a set of tools that work with our system performance data and present that to us as text. Specifically, the placeholder for nearly all of the Linux kernel’s performance and configuration data is the /proc virtual file system, procfs. Inside here you can find everything you need that represents the running state of your system. Processes, memory utilization, and disk performance data all of this is presented as files inside of directories inside /proc.

Now, let’s take this idea and extend it to SQL Server. In SQL Server we have DMVs, dynamic management views. These represent to current running state of our SQL Server. SQL Server exposes the data in DMVs as table data that we can query using T-SQL. 

So, Microsoft saw the need to bring these two things together, we can expose the internals of SQL Server and its DMVs to the command line via a virtual file system. And that’s exactly what dbfs does, it exposes all of SQL Server’s DMVs as text files in a directory. When you access one of the text files…you’ll execute query against the SQL Server and the query output comes back to you via standard output to you Linux console. From there you can use any of your Linux command line fu…and do what you want with the data returned. 

Setting up dbfs

So first, let’s go ahead and set this up. I already have the Microsoft SQL Server repo configured so I can install via yum. If you have SQL on Linux installed, you likely already have this repo too. If not, go ahead and follow the repo setup instructions here. To install dbfs we use yum on RHEL based distributions.

First off, think about what’s going on under the hood here…we’re going to allow the system to execute queries against DMVs…so let’s try to keep this as secure as possible, I’m going to create a user that is allowed to only query DMVs with the VIEW SERVER STATE permission. So let’s do that…
 
Let’s log into our SQL Server via SQLCMD
And execute this code to create a user named dbfs_user 

Once created, let’s assign this user permissions to query DMVs
The next step is we need to create a directory where dbfs will place all the files representing the DMVs we wish to query
Now, let’s go ahead and configure dbfs. I’m going to place it’s configuration file in /etc/ since that’s the standard location for configuration files on Linux systems.
And inside that file, let’s use the following configuration. Pretty straight forward. Define a configuration name, here you see server1, the hostname which is the locally installed SQL instance. We’ll use the username and password of the user we just created and also defined is a version. While this isn’t very well documented, the code here shows that if you’re on version 16 (SQL Server 2016) or newer it will create files dbfs files with a .json file extension which exposes your DMV data as…you guessed it JSON. Also if you want to add a second server to dbfs, just repeat the configuration inside the same text file.

Running dbfs

Now with all the preliminaries out of the way, let’s launch dbfs. Basic syntax here, the actual program name with the parameter -c pointing to the configuration file we just created and the -m parameter pointing to the directory we want to “mount” our DMVs into.
Now, what’s interesting about dbfs is if you log out dbfs stays running. Honestly, I don’t like that, if this is the case it should be running as a service managed by systemd or whatever init daemon you’re using on your Linux distribution. I mentioned that on their GitHub repo. If this is going to be a user process, then I should have the choice the background the task myself.

Using dbfs

Looking at the source for dbfs it gets a list of all DMVs from sys.system_views from the SQL Server you configured it to connect to, then creates a file for each and every one of those DMVs. So we have full coverage of all the DMVs available to us and since you can use any bash command line fu to access the data now…the options are really limitless. Microsoft has a few good demos on the GitHub repo here. Let’s walk through a few examples now.
 
Accessing a DMV

This is pretty straight forward, you read from the file just like you would read from any other file on a Linux system. So let’s do that…we add the column -t option to make sure all the columns are aligned in the output.

And our output looks like this…

Notice in the output above how the connect_time column is split incorrectly? We need to tell column to use the tab as a delimiter. By default it uses whitespaces. So let’s do that…
And now our output looks much better

Selecting off a subset of columns

Well you probably noticed that the output is a bit unruly since it’s outputting all of the DMV’s columns. So let’s tame that a bit and pull out particular columns. To do that we’ll use a tool called awk which will print out columns based on the numeric index, so $1 is the first column and so on. 
And our output looks like this
Something isn’t right…as DBAs we think of things in rows and columns. So we’re going to count across the top and think the 7th column is going to yield the 7th column and it’s data for each row, right? Well, it will but data processed by awk is whitespace delimited by default and is processed row by row. So the 7th column in the second line isn’t the same as the output in the first line. This can be really frustrating if your row data has spaces in it…like you know…dates.
 
So let’s fix that…the output from the DMVs via dbfs is tab delimited. We can define our delimiter for awk with -F which will allow for whitespaces in our data. Breaking the data only on the tabs. Let’s hope there isn’t any tabs in our data!
And the output from that looks like this, much better but we don’t have the nice columns.
We’re so close, we can’t throw column on the end to make this nice and columnar because awk with this configuration it will remove the tab delimiters on it’s output stream. column by default will do the same thing too, but we can let column do the work for us and have it print tab delimiters in it’s output stream. 
And voila, we end up with some nice neatly formatted output

Searching in Text

We can search for text in the output using grep, here’s a quick example looking for the dedicated admin connection in dm_os_schedulers
And here’s the output. 

SQL folks…keep in mind, grep will only output lines matched, so we loose the column headers here since they’re part of the standard output stream when accessing the file/DMV data.

Moving forward with dbfs

We need the ability to execute more complex queries from the command line. Vin Yu mentions this here. As DBAs we already have our scripts that we use day to day to help us access, and more importantly make sense of, the data in the DMVs. So dbfs should allow us to execute those scripts somehow. I’m thinking we can have it read a folder on the local Linux system at runtime, create files for those scripts and throw them in the mounted directory and allow them to be accesses like any of the other DMVs. The other option is we place those scripts as views on the server and access them via dbfs. Pros and cons either way. Since it’s open source…I’m thinking about implementing this myself :)

Next is, somehow we need the ability to maintain column context throughout the output stream, for DBAs it’s going to be tough sell having to deal with that. I know JSON is available, but we’re talking about DBAs and sysadmins here as a target audience. 

In closing is a great step forward…giving access into the DMVs from the command line opens up SQL Server to a set of people who are used to accessing performance data this way. Bravo! 

PowerShell Remoting in Multi-Platform Environments – Use Cases

In our previous post we discussed how to implement OpenSSH (the plumbing) as the transport layer for PowerShell remoting. In this post, we’re going to leverage that configuration and look at some common remoting use cases. This is one of the core things I use everyday when I work with PowerShell. Remoting gives me the ability to administer scale up and administer large collections of systems. So like I said in my very first post about my PowerShell journey, it’s become a part of my every day life and techniques like this are at the core of how I use PowerShell. So let’s get started…

We’re going to look at the following

  • Entering a remote command line interface on a server
  • Using sessions
  • Executing commands against one remote system
  • Executing commands on a collection of remote systems

This is going to be awesomeness…Yep, I said awesomeness, when I’m finished you’ll agree that’s about the only way to describe this.

I do want to point out that we’re using Beta software here, things change. The version of PowerShell I’m using here is Version 6 Beta 2.

Remote Command Line Interface

Up first, let’s cover the simplest remoting case, where we use remoting to get a command line interface to a remote system.

Nothing special here, simple syntax, but the seasoned PowerShell remoting pro will notice that we’re using a new parameter here -HostName. Normally on Windows PowerShell you have the -ComputerName parameter. Now, I don’t know exactly why this is different, but perhaps the PowerShell team needed a way to differentiate between OpenSSH and WinRM based remoting. Further, Enter-PSSession now has a new parameter -SSHTransport which at the moment doesn’t seem to do much since remoting cmdlets currently use OpenSSH by default. But if you read the code comments here, it looks like WinRM will be the default and we can use this switch parameter to specify SSH as the transport.

Once we execute this command, you’ll have a command prompt to the system that passed as a parameter to -HostName. The prompt below indicates you’re on a remote system by putting the server name you’re connected to in square brackets then your normal PowerShell prompt. That’s it, you now have a remote shell. Time to get some work done on that server, eh? Want to get out of the session, just type exit.

Using Sessions

So, in PowerShell, when you exit from a remoting session your session goes away. So any work, variables or program state you had will disappear. But what if we wanted to keep our session’s state, log out and log back into that session again? We can leverage sessions to help us persist our remote sessions. We can create a session connect and disconnect from it at will. So let’s check that out.

We create a new session by calling New-PSSession. This will start up the remoting session just like we did with Enter-PSSession, via SSH, but we won’t attach to the remote terminal. The output we do get from New-PSSession shows us the details of the session created. We use Get-PSSession to get a list of all current sessions.

We can connect to an existing session with Enter-PSSession. If there are multiple entries in your session list…just use the correct Id from the session list to identify the session you want to connect to.
Now, when you exit from this session with the exit command, your session will persist. Let’s use Get-PSSession to see.
Cool, our session is still there, let’s reuse it. This time we’re going to assign it to a variable this time. This will keep our command line syntax simple in the upcoming examples.

Executing Commands Against One Remote System

We can use PowerShell remoting to execute a command against a remote system, so let’s look at the simplest case…just one command. Here you can see, we’re using the Invoke-Command cmdlet to execute the Get-Process command on the remote system and it’s output returns to our console. Here we have the top 5 processes by CPU on the remote system.

Executing Commands Against a Collection of Remote Systems

Now, let’s add one more session to our list of sessions. With a new session to server2.
New-PSSession just returns the session created, so let’s get a list of all of the sessions with Get-PSSession. There you can see our two remoting sessions are currently opened the new session has a new Id.
This time, let’s take both our sessions and save them to a variable. We’re going to reuse $s.
Now to the amazing part…let’s run a command against a collection of remote systems.
This example brings up a very interesting point in PowerShell remoting, everything inside the curly braces of the Invoke-Command call happens on the remote system. This means, all the sorting and processing is remote, the remote systems deal with all the computational aspects and less data has to traverse the network. There’s a deeper topic of serialized objects we’ll cover in a later post. Now the output from these systems comes back to your console as fast as it comes back, so there’s really no order here. So you might get a list of data that’s all commingled. But we can control that by sorting locally. Let’s look at that.
Now, let’s say you want to get the top 5 processes across all the systems in your sessions list, easy enough.

We move the sorting to the pipeline on our local system, notice the Invoke-Command call is going to return the entire process list from all machines, then it will sort the data locally and output to the local console. So you can see in this list here we have the top 5 processes across our two systems. Imagine you had a web farm of servers and you needed to chase down a bad process fast, this would be useful, right?

When we’re all finished, you’ll want to clean up your sessions. We can do that a by passing the $s variable into Remove-PSSession

And that’s it, so like I said in my very first post about my PowerShell journey, it’s become a part of my every day life and techniques like this are at the core of how I use PowerShell.

PowerShell Remoting in Multi-Platform Environments using OpenSSH

So in my last post I told you about how I started my journey on learning PowerShell, let’s keep going down that path together. In this post I’m going to introduce PowerShell Remoting in Multi-Platform Environments, specifically using OpenSSH. We’ll discuss WinRM in multi-platform systems in an upcoming post.

Have you ever had to execute a command against one system or a collection of systems? Have you ever wanted a remote shell on a Windows system? Using Remoting you can you can do all of these things, very, very easily.

The Plumbing

So for our first stop, let’s talk about how remoting works. Essentially what’s occurring under the hood is a PowerShell process on one system is sending commands to a PowerShell process on another system then the remote system sends the output back. Easy enough, right? Let’s dig into to how it exchanges information.

The PowerShell processes exchange messaged defined in the PowerShell Remoting Protocol (PSRP). PSRP defines semantics of the communication and types of message being exchanged. Now these messages need a transport layer, for that we have WinRM and OpenSSH. Let’s look at both.

By default, Windows PowerShell systems Remoting connections communicate using WinRM which uses HTTP/S endpoints. WinRM is an implementation of the WS-Management industry standard. WS-Man defines the semantics of the interactions between the systems and data transfer. PowerShell encapsulates its PSRP messages and uses WinRM to transport them between systems.

On Linux systems, well…they don’t communicate with WinRM, by default, so we need something else…that’s where OpenSSH comes in. PowerShell encapsulates it’s inputs and outputs as PSRP messages and it uses SSH to securely exchange the messages between systems.

The PowerShell team has implemented WinRM over HTTP remoting on Linux and already has a port of OpenSSH on Windows. The key idea going forward is…we will have a choice, we can use WinRM (HTTPS) or we can use SSH regardless of OS. For the rest of this post, we’re going to talk remoting over OpenSSH, even on the Windows side of things. We’ll discuss WinRM in multi-platform systems in an upcoming post.

The Installation

For starters, go ahead and install PowerShell Core your systems if you haven’t already, its pretty straightforward check out this link here.

Next, on Windows we need to install OpenSSH and the PowerShell team has developed a Win32 port of OpenSSH. So go check that out here. I’m going to be honest, the installation isn’t easy. But I’m going to let you in on a secret, Darwin Savoy wrote a killer Chocolatey package that does the hard work for you. Check that out here. You’ll want to use the following syntax to install OpenSSH as a service.

If you’re on Windows verify OpenSSH is up and running

Now on Linux, just about every distribution includes an OpenSSH installation their default setup…so we’re not going to cover that here.

The Configuration

Wether it’s Windows or Linux, we need to tell OpenSSH about PowerShell and we can do that with a subsystem configuration.

On a Windows system we’ll need a line like this in SSH daemon configuration file C:\Program Files\OpenSSH\sshd_config. Find the subsystem section and add this line.

On a Linux system we’ll need a line like this one in /etc/ssh/sshd_config. Find the subsystem section and add this line.

Basically what’s going on here is we’re telling OpenSSH to invoke a PowerShell process when it receives a message from a remote system that’s says it’s going to use the PowerShell subsystem. So SSH is just the conduit for these two PowerShell processes to communicate their inputs and outputs.

When you’re finished editing these files, you’ll want to restart the SSH daemons on your system…Windows or Linux.

On Windows that’s Restart-Service -name sshd on Linux that’s systemctl restart sshd

Now here’s the thing…before you move on…test your SSH connections. You will absolutely go crazy if there’s an issue with your SSH configuration and you try using PowerShell Remoting right away. If there’s an issue, PowerShell will happily swallow OpenSSH’s errors and you’ll get no feedback. Simply grab your favorite SSH client and try to connect to the systems right now and verify you can connect and get a command line. Things will be a lot easier for you before you try to use remoting.

Up next in this series we’ll look at some remoting use cases, specifically using a command line interface, executing commands against remote a remote system and executing commands against a collection of remote systems.

Why PowerShell?

Why do I use PowerShell?

Well, here’s a little back story…last year I was involved in a Pluralsight Play by Play with Jason Helmick and Jeffrey Snover for launch of Open PowerShell on Linux and Mac. Before this video, I didn’t take PowerShell seriously. Basically, if I Google’d a problem and found a solution in PowerShell I would grind my teeth and copy and paste the text into the foreign blue console and cross my fingers.

Fast forward a bit to PowerShell Summit this year, Jason and I were slated to do a session together on…PowerShell on Linux. I had to miss the trip to Seattle this year, but during the session Jason told a story a story about me learning PowerShell…check it out here around 28:27. Its pretty funny. 

Pipelines similar but different

If you don’t want to listen to Jason’s story, it goes like this. I needed to learn PowerShell…FAST. I had only a few weeks to build my skills enough so that I could sit at a table with the inventor of PowerShell and an industry recognized expert and have a meaningful conversation and oh yea, record a 5 hour training video. So as Jason says I “did the Don Jones/Jeff Hicks thing” (there’s a newer version available now) .

Well, having a UNIX background I understood the concepts of a pipeline. UNIX processes have had pipelines for a long time. Byte streams can move data from one command to the next allowing you to build more complicated commands. PowerShell can do the same, but it moves objects…rather than text/byte streams. 

Well, while learning PowerShell I built a PowerShell cmdlet pipeline to get the top 10 processes on a system sorted by CPU.

And to do the same in bash, we need to do something like this…

So, I worked through these examples and I fired up my email client and sent an email to Jason and said hey Jason…here’s two commands that do the same thing. One in PowerShell and one in native bash commands. I asked…hey, which one do you think I like better? 

As we learned in the PowerShell Summit video, he expected me to answer with my crotchety, suspenders wearing UNIX guy answer…well he was in for a surprise. I told him I liked the first one better…the PowerShell version.

I actually don't own suspenders...neckbeards are another story

Now hear me out, I’ve been using Linux/UNIX since 1997, I manage large internet commerce sites through the holiday season, I’ve done PhD level research on topics like IO, CPU scheduling and memory management…literally my fingers can type the bash commands to get the top 10 processes without even thinking…muscle memory to the max. BUT, the PowerShell version of this literally reads like a sentence. Get-Process, Sort-Object, Select-Object, pretty straight forward stuff here. No surprises. the commands do exactly what they say. This means, I can put stuff like this in a script, and other people can read it without 20 years of experience. 

What’s next?

Well, since the Play By Play, PowerShell isn’t the Google the answer…copy/paste solution anymore for me…I’ve decided to really take this seriously and PowerShell is now a go to tool in my toolbox. I’m not telling you this because I’m a Microsoft MVP, I’m not telling you this because I did a video with THE PowerShell gurus…I’m telling you this because I really use PowerShell now…every day.

This post is the first post in a series I plan on bring to you that will document my discovery process of using PowerShell on Linux. We’ll discuss the techniques and technologies I use to solve some real world problems in a multi-platform world!

Now, there’s two things I really consider special about my first go with PowerShell…asynchronous job posting via Remoting and Desired State Configuration (DSC). We’ll cover these topics in this series.