Category Archives: Book Review

Book Review – SQL Server Internals: In-Memory OLTP

In-Memory OLTP – a potential game changing technology

Every once in a while a technology comes out that has the potential to change things dramatically. In-Memory OLTP (Hekaton) is one of them. The design team set out with a goal of reaching an order of magnitude improvement over existing technologies and techniques. To do so they had to rethink key facets of the relational database system, latching, locking, logging and statement compilation. When a technology as potentially disruptive as this comes along it gets everyone’s attention. When an opportunity to review a book based on this technology came along it certainly is worth the effort. I spent a ton of time with this book (maybe a little too much), reading and re-reading chapters but it was worth every minute.

SQL Server Internals: In-Memory OLTP by Kalen Delaney ( twitter | blog ) is one of the first books published dedicated to the topic and definitely will be (is) the go-to book on the topic. The author is a well known SQL Server internals expert that has an extensive history of top-tier publications and presentations and this book is no exception.

SQL Server Internals: In-Memory OLTP

In-Memory OLTP introduces a fundamental shift in how the database engine stores and accesses data. This book walks you through each of these topics with almost perfect precision. I would suggest reading this book from beginning to end. This way, when needed, you can refer back to a chapter and have an understanding of its context in the larger system.

  • The first three chapters are the preliminaries of In-Memory OLTP. Chapter 2 introduces the basics and some of the limitations of the technology and in chapter 3 the reasons behind those limitations become clear with the discussion of row structure. These chapters bring into light that we’re dealing with something new and unique. A traditional in-memory data structure that respects the ACID properties of a transactional relational database system
     
  • I felt than the second half of Chapter 4 on range indexes struggled to eloquently describe the technical nuances of the bw-tree data structure. I read this chapter several times and still struggled. Also, it is not entirely clear what the benefit of the mapping table is, I had to refer to the research paper for clarity here
     
  • Chapter 5 describes transactions and the opportunistic locking design that In-Memory OLTP uses. In contrast to pessimistic locking which requires latches and locks, opportunistic locking uses a validation technique at the time a transaction attempts to commit it’s data. Even thought it’s covered in detail in other texts, I would have liked deeper coverage on read phenomena here.
     
  • The best chapter in the book is Chapter 6 logging, checkpoint and recovery. Logging and recovery are completely re-engineered in In-Memory OLTP. This chapter walks you through the entire process culminating with an exceptional discussion on the recovery process. This chapter is so smooth it feels like a conversation.
     
  • Chapter 7 describes native compilation of tables and stored procedures, it’s interesting that after the bottlenecks of disk, latching and locking are removed we’re left with compilation as a potential bottleneck. This chapter presents the techniques used by In-Memory OLTP’s native compilation feature and it’s limitations.
     
  • The book concludes with a chapter on configuration best practices and where In-Memory OLTP fit in the database world.  This part of the book is key for those that have to evaluate a technology for it’s architectural fit into their data-tier.

There are few shortcomings with this book, the bw-tree could have been more clearly explained. Parts of the book may be hard to approach for some, but it is covering a bleeding-edge topic that is still being actively researched. The book toes the line between IT topic and graduate level computer science. I’m surprised I didn’t see some complexity analysis in some of the chapters :)

My detailed chapter notes are available here 

In summary:

This is the book that you will reach for when you have a question about how it works. I really look forward to when In-Memory OLTP matures enough that some it current limitations are relaxed.

Thank you to Kalen and the folks at RedGate for this release!

If you need help evaluating In-Memory OLTP and where it fits in your data-tier or system architecture, please feel free to contact me at aen@centinosystems.com

Follow me on Twitter @nocentino

Book Review – SQL Server Internals: In-Memory OLTP – Detailed Notes

Here are my unedited chapter notes:

Chapter 1
Chapter 2
  • Collation – current version requires BIN2 on character index columns. Best to do so at the column level. Supports only sorting, comparison and grouping. Will remove the need for case sensitive code on tables and columns but not data.
  • Collate database_default to handle tempdb collation of temp objects. Research more.
  • Interpreted SQL via interop useful for ad hoc or migration of code.  Doesn’t perform as well as compiled.
  • Key point restrictions really show that the Hk tables need to be isolated bc of the interface between the engines. P 46
Chapter 3
  • The restrictions of chapter 2 become clearer in this chapter with the description of the row structure. Traditional CS data structure
  • All indexes can be thought of as covering, each row has the actual data.
  • Also clear why tables structures can’t be modified. P51
  • Good example highlighting the time stamp usage in MVCC
Chapter 4
Hash Indexes
  • Indexes must be defined at table creation – part of the header
  • Indexes are in memory structures only and operations on them are not logged
  • Hash index – hashing function and collisions explained. Linked lists in buckets
  • Index pointer – array in the row header. Null when row points to no other rows. Points to the “next” row in the index at that hashed index entry. Every row had at least one pointer to it, either directly from the hash index bucket or from another row
  • After reading a second time this was much clearer
  • Hash indexes useful for equality search with high cardinality
  • For seeking on a hash index you must provide all key columns. For example last and first with a search on last will not use the index
  • Buckets – number of elements in the has array. Should be equal or greater than the cardinality of the index key columns. So that a bucket points to a unique values chain. Too few you get collisions on different values. Too many is wasteful. dm_db_xtp_ hash_index_stats
  • Large average chain length could mean
    • 1. Empty Bucket count low. Avg and max chain lengths are similar. Many different index keys to the same bucket.
    • 2. Empty bucket count high. Max length is high in relation to average chain length. Many row with duplicate keys or skew in key values. Not evenly distributed
  • Short chains along with high empty bucket cube could indicate bucket count is set too high
Range Indexes
  • Range indexes – If cardinality is unknown or will be range searching hash indexes aren’t helpful.
  • Range index connects ties at the leaf level using a bw-tree. Latch free variant of a b-tree. Index pages are variable sized. Index page contains an ordered set of key values. For each value there’s a pointer to either another internal index page or to a leaf page pointing to a data row. At the data row there could be a list of rows linked together on the key value
  • A key distinction for bw-trees is the page pointer is a logical PID Rather than a physical memory address.  PID is the position in the mapping table which contains the physical location. Pages are never updated they are replaced and the mapping table is updated so the PID uses the new physical address
  • At the leaf level still uses the PID which points to the data row
  • Delta record – a page containing an insert or update. The mapping table is updated with the physical address of the delta record. Searching/range scanning may have to traverse this chain that is until consolidation occurs. Combining pages into a new base page. Benefit is not having to update tree during each update
  • It’s not clearly explained what the benefit of the mapping table is. 
Chapter 5
  • Pessimistic acquiring a shared lock or range lock. Opportunistic checking that rows haven’t changed before commit of transaction
  • In MVCC SQL assumes that concurrent transactions won’t interfere and performs validation checks once a transaction commits to ensure it obeys the required isolation level
  • Isolation levels used in in memory OLTP.
    • Snapshot
    • Repeatable read
    • Serializable
    • There should be a description of the read phenomenon
  • Cross container transactions are really two sub transactions. one disk based and one in memory transaction
  • If executing an explicit transaction it will run in read committed. If accessing a in memory OLTP then this will error as this is an unsupported isolation level.  Do specify the isolation level as a table hint.  Read committed is supported during auto-commit or implicit transactions
  • Technically for a single statement transaction there is no difference between read committed and snapshot on a memory optimized table
  • Snapshot cannot be used in cross container transactions because each is considered/implemented as a sub transaction
  • The default isolation level is snapshot. There’s a new DB option to set it default for memory optimized tables. Alternatively set the isolation level with a table hint
  • Monitoring transactions sys.dm_db_xtp_transactions xtp_transaction_id is the sequential global transaction ID used to begint and endt
  • Read write conflict a transaction tries to update a row that has been updated by an active transaction. But since there are no locks anomalies can still occur.
Validation phase
  1. Check for violations of isolation levels see read committed example on pg 107
  2. Wait for commit dependencies to goto 0
  3. Log the changes
  • This is the only waiting that can occur in memory optimized tables.
  • Waits could happen for commit dependencies, transaction log writes. Both which can be very short.
Validation
  1. Validate no changes violate isolation levels. Basically is the transaction globally transactionally consistent.
  2. Commit dependencies. A CD is when a transaction reads data from another transaction that has finished phase 1 of validation with valid end time stamps (transaction Id or actual time stamps?).  In practice very rare. There is an EE and wait type. Waiting_for_dependencytx_event.
  3. Logging. If any of the modified tables were created with SCHEMA_AND_DATA then SQL must log. Sql reads the write set to determine what to log. transactions track all there changes in the write set. A list of delete/insert ops with pointers to the version associated with each op
    1. Once a log record is hardened to storage the state of the transaction is changed to committed. Then reduce the dependency counters by one.
  4. Post processing. Updates time stamps of each row inserted or deleted by this transaction. Switches time stamps from transaction IDs to real time stamps.
  • Garage collection – Find rows who’s time stamps are earlier than the oldest active transaction on the system. It is marked stale
  • The actual collection process is completed in cooperation with user threads. During a regular user access. If a stale thread is encountered it will be makes expired or removed.
Chapter 6
The transaction log consists of
  • Log streams – changes made by committed transactions. Insertion and deletions of row versions.
  • Checkpoint streams
    • Data streams – contains all versions inserted during a timestamp interval
    • Delta streams – associated w data streams, contain a list of integers which versions its corresponding data steam have been deleted
  • No longer uses LSNs for sequencing uses end Ts for serialization order
  • Still only supports one log per database but due to more efficient logging the potential for log bottleneck is reduced.
  • Reduced logging – no need to log index operations. All indexes are rebuilt during recovery
  • Each atomic change isn’t logged to a single log record rather InMemory OLTP will combine into a single record
  • InMemory OLTP does not need to log uncommitted transactions – opportunistic remember?
  • For InMemory OLTP no logging is used for info. Just redo
  • 100 inserts into a disk based table generates 200 log records. Into a InMemory OLTP table only 3.
Checkpoint
  • Continuous check pointing – checkpoints are continuous and incremental as activity accumulates. Rather than periodic like in disk based tables
  • Streaming IO – ie. sequential
  • Checkpoints are manual or every 512mb
  • Checkpoint files are stored in file stream files
  • Two types of checkpoint files data and delta. Called a checkpoint file pair
    • Data file – new versions of rows in a time interval. Append only files. Once closed only read
    • Delta file – info about about which rows have been deleted from the corresponding data file. Once the data file is closed so is the delta file. At recover used to filter out rows from the data file
  • CFP pairs can be recovered from concurrently
  • On insert a continuous checkpoint will append the row to a checkpoint file
  • A checkpoint event will close a set of files state goes from under construction to active. No more writes to these files but can be referred to by other insert update operations from other CFPs
  • Every checkpoint generates a new set of files. As such older files may no longer be needed and are merged
  • If no open transactions refer to the CFPs they transition to non-active. After a log backup they are no longer needed and can be removed by the garbage collection process
  • See page 135 for states. db_xtp_checkpoint_files
  • Pre-created – pre built as an optimization
  • The number of pre-created CFPs is equal to the number of schedulers
  • Under construction – open CFP being with continuous checkpoints being written to. A checkpoint event will close and change to active
  • Active – CFPs containing inserted/deleted rows for the last checkpoint event. Continuous checkpoint is no longer writing new data but deleted are added to delta files. During recovery active CFPs contain all inserted/deleted rows needed to restore the data before applying the tail of the log backup
  • Storage array – 8192 entry array holding CFP entries. Max 256GB of CFP data
  • CFPs in the storage array along with the tail of the log backup represent all the needed data to recover a InMemory OLTP table in a db
  • Before there are any active CFPs we can use the transaction log for recovery
  • Calling checkpoint is really closing a continuous checkpoint
  • It changes the under construction CFPs to active. Adds them to the storage array. All log entries on InMemory OLTP tables  not covered by a previous checkpoint are converted into active CFPs.
  • Once finished:
    1. all buffered InMemory OLTP writes are flushed to data and delta files
    2. checkpoint inventory (file descriptors from the previous checkpoint interval and any files added by the current checkpoint) are hardened to the log
    3. the location of the inventory is hardened to the log
  • A checkpoint event includes a timestamp which indicates that the effects of all transactions are before the checkpoint timestamp are in checkpoint files rather than the log thus the log is not needed to recover them. Still need a log backup to truncate log.
Merging
  • Each checkpoint generates new files. More files, the longer crash recover will take
  • We merge files with adjacent timestamp ranges when their active content drops below a percentage. See page 145 for table. Also when adjacent files are < 50% full. Can happen during manual checkpoints
  • Due to this process the largest storage space CFPs can use is two times larger than the corresponding memory optimized size
  • You can merge manually if needed. Very rare
  • Do the demos from this section
  • Merged source files are removed from the storage array and are eligible for garbage collection after checkpoint and log backups.
Garbage Collection of checkpoint files
  1. Checkpoint – guarantees that data in checkpoint files is no longer needed.
  2. Log backup – guarantee that the log truncation point is safely beyond where the CFPs will be needed.
  3. Unused files garbage collected – uses the same mechanism as regular file stream data.
  • We can force GC and file stream GC via a SP Page 152-3
  • If you do a manual GC if files you’ll need to make sure you account for the additional log backups.
Recovery
  • Starts after the most recent checkpoint inventory has been discovered during a scan of the tail of the log. Sql server tells InMemory OLTP this location and can begin a parallel recovery. The global transaction timestamp is initialized with the highest value found in the transactions recovered
  • Each delta file is used as a filter to eliminate rows from the corresponding data file.  Each CFPs can be processed in parallel. One thread per core
  • As data loads rows are linked into their indexes. Hash indexes – rows are linked into their buckets. Range indexes – added to an existing chain or created if the key doesn’t duplicate one already encountered during recovery
  • CFP load is complete the tail of their is replayed front the time of the last checkpoint. Finished!
Chapter 7
  • A table or stored proc can be natively compiled
  • The performance benefits of native compilation are best in complex stored procedures. Use for most performance critical parts
  • TSQL for tables and stored procedures generate C code. Recompiled at instance startup
  • The execution plan is part of the DLL. There is no re compilation. You must drop and recreate if you want a new plan. Compiled ok first execution and after instance restart and AG failover
  • Intended for short basic operations many complex constructs are not allowed
  • No parameter sniffing at compilation. Compiled with parameters UNKNOWN
  • When using hash indexes the query must specify all key columns in the where clause or else scan. Also based on equality. Must specify whole value not a partial string
  • Range indexes are only forward. There are no previous pointers. You’d need an index ascending and descending
  • No Halloween protection in plan. On disk based tabled this is via spool operators. In InMemory OLTP it’s done via statement id in the row header
  • No parallel plans
  • No auto update stats. Because there’s no row mod counter
  • When creating stored procedures be sure data is loaded at compilation time
  • Need native compilation to achieve the performance gains of InMemory OLTP. Needed to reduce CPU instructions so couldn’t do interpreted stored procedures
  • Query stats are not captured by default. Need to enable with a stored procedure. Page 178. Can be used to collect on one object or all
Chapter 8
  • Memory allocation is critical. Running out of memory can cause operations to fail. If under memory pressure InMemory OLTP will more aggressively clean up old row versions
  • 256GB max for all InMemory OLTP tables in a DB. This is because of the number of checkpoint files
  • Rule of thumb is plan for 2x size of data for memory allocation. More of write heavy. Less if read heavy.  See pg 187 for more details
  • Use resource governor to control memory allocations. Many Dbs can be in the same pool. Db must be taken offline then on-lined to use pool
  • In sys.indexes range indexes are type 2 same as non clustered. Hash indexes are 7
Best practices 
  • Use collate bin2 at the column level rather than db level
  • Be accurate in you bucket count. Should be close to the number of unique values
  • For low cardinality use range instead of hash
  • Stats are not automatically updated on InMemory OLTP tables
  • Memory optimized table variables behave the same as table variables. But don’t use tempdb. Could help relieve contention
Add summary of 199
Good summary of where InMemory OLTP fits in.

Book Review – Query Tuning & Optimization

Via Twitter in October Benjamin Nevarez @BenjaminNevarez solicited help for book reviewers for his new book Microsoft SQL Server 2014 Query Tuning & Optimization I jumped at this opportunity as Benjamin is a great author with significant SQL Server experience and this new book further solidifies his place in the community.

Query processing and its inner workings are extremely dense technical concepts that Benjamin is able to present to the reader in a very approachable readable manner. The book elegantly walks you through the major components of the query processor and dives deep where needed.

The code examples in the book are concise and accurately reenforce the key points the author is trying to convey in the text. 

The book excels at connecting very complicated query processing topics together for the reader in a simple and approachable manner.

It covers the many new features of SQL 2014, including In-Memory OLTP, incremental statistics, the new cardinality estimator and the new features added to columstore indexes.

There are few shortcomings with this book, I would have preferred citations throughout the book. Key concepts are introduced without citation and I would like to jump into some topics, specifically into the materials the author used when writing. The book does have an exceptional reference section, but there is no connection back to the text.

This book is now an a permanent fixture in my library of SQL Server references, I will likely need to replace it when I wear it out.

If you need help with query tuning or other performance issues with your SQL Server please contact me at: aen@centinosystems.com

Query Tuning & Optimization

Here are some of the (unedited) notes that I took while reading the book, covers the first six chapters:

Chapter 1

Solid intro. Good examples specifically on the warnings sections page 24 with references to upcoming chapters on how to deal with them. Gets off the ground with the appropriate groundwork/fundamentals for QP architecture and reading query plans and to move onto the subsequent chapters. 

Chapter 2
The chapter begins with the building blocks of query troubleshooting how to find resource utilization exec_requests and exec_session and also plan hash and text hash for when you need to hunt down your code or plan. Page 51 CPU queries are great splitting on plan hash so each batch is in results. The chapter then moves from the building blocks of hashes and text to tell the reader which tools are available to find troublesome queries on your systems. 

Chapter 3
Walks us through the query optimization process with great detail and contrete examples. Toes the line between academic and practical. Disceting a QP at this level will give you great insight into the inner workings of sql server and allow you to create better more well performing queries.  Having references to academic and professional papers is a great resource.  But I would have loved to have had reference numbers so I could go off on a deep dive when needed. This chapter clears the air and busts some myths that are found on the web such as join order at the query level matters. Also gives you the ability to dive deep on your own questions so you can runoff and experiment. 

Chapter 4
The transition into this chapter connects the previous chapters.  There is a solid example of this on page 130-131. Ben quickly connects the ideas of operations to indexes and their potential efficacy/efficiency. When discussing tipping point between bookmark lookup and scan, the reader is left wondering why.  Lack of explanation. Everything else so far is covered in such great detail but this wasn’t. In the parallel limitations section page 156, lacks any insight as to why these features will not produce a plan. 

Chapter 6
Avoid the use of local variables in a where clause. Uses a 30 percent guess. Use parameters or literals. The most concise explanation of the histogram yet. Wanted deeper discussion on stats skew. 

First Blog Post

Well, this is my first blog post…I’ve said I was going to do this for a long time and it’s finally here. 

Via Twitter a in October Benjamin Nevarez @BenjaminNevarez solicited help for book reviewers for his new book “Microsoft SQL Server 2014 Query Tuning & Optimization” I jumped at this opportunity two reasons, first Benjamin is an exceptional author and two I want to start contributing to the community. I feel that reviewing books can help others decide on how to spend their valuable time and hard earned money. Hopefully this effort helps someone in someway. 

I’ve enjoyed doing this so much that the folks at RedGate Software @RedGate asked for reviewers of Kalen Delaney’s new book “SQL Server Internals: In-Memory OLTP” I jumped on that too! Hopefully I will get this done quicker than the first :)