Saturday, September 17, 2011

Parallelised Queries are Selfish Queries

SQL Server Parallel Query Processing and MAXDOP

I am prompted into writing this post after reading the msdn article Understanding and Controlling Parallel Query Processing in SQL ServerIt is a very informative article.  I cannot disagree with anything in the article, but there is an elephant in the room.  The article dissregards IO performance bottlenecks.  It actually states that it will disregard IO performance.  "We assume that your I/O system is configured properly ..... so that the I/O system is not the primary performance limiter."  I am going to discuss IO performance with MAXDOP, because I believe it is often the main performance limiter.

As a bit of background, SQL Server has a parallel query processing feature that is enabled by default.  This permits SQL Optimiser to choose a plan that involves breaking one query into sub tasks that can run in parallel.  This can involve executing, essentially, the same plan, but in parallel subtasks, or it can involve a completely different plan and again breaking the query into subtasks that can run in parallel.  Really quite an amazing piece of technology. When it is working well for a query, it is interesting to see figures like cpu=10 seconds, elapsed=1 second!

In many of the very large database environments where I work, IO is the most critical resource.  Perhaps this is my misfortune, but I suggest that IO performance is often a performance limiter in large database systems, and probably will be so until SSD (solid state disks) are more commonly employed.  The problem with parallel query plans is that one query can simultaneously request multiple table scans, selfishly requesting more resources in parallel in the hope of getting a faster response.  Just one table scan from one query will tend to have parallel IO requests, as multiple SQL Server read engines are given multiple page read ahead requests.  So, even with a single non parallelised SQL query, there will already be some parallel IO requests.  Run several non parallelised queries at the same time and there are many more parallel IO requests than there are queries.  That is all before Parallel Query Processing kicks in!

Let's take a very simple example, where the SQL query has a predicate on a non indexed column of a very large table (the table is larger than SQL cache), therefore requiring a sequential scan of the table.  Without Parallel Query Processing, SQL Server will sequentially scan the very large table, using several read engines working in parallel to get multiple page ranges.  Note, there is already parallel IO requests with a single non parallelised SQL query.  Now if we enable Parallel Query, SQL can break the scan into several sub tasks, each of which will ask for range scans of the table.  Each of these scans can busy up multiple read engines, which can lead to IO throughput saturation.  That is the parallelised query will result in scanning the same table, but it will flood the IO system with many more parallel requests that can end up in a long queue.  A good way to think about it is to imagine an old fashioned bank branch with 30 tellers and everyone queues up for the next available teller.  That would be non parallelised.  Now imagine that someone comes in with bags of money that requires lots of counting and when he/she gets to the front of the queue, all the tellers attend to counting his/her money together (in parallel).  This might be good for the selfish individual, they get served faster, but it is not good for the performance of the branch, as now all customers will be waiting for the selfish customer to finish.  For this reason, I like to think of parallelised queries as selfish, or unsocial.  They want to take all the resources, even though this is not an efficient use of the overall resources and others are negatively impacted.  This is fine in a system with very few (perhaps one) customer.  Note, there is always an overhead in parallelising a query, even if it is using essentially the same plan as a non parallelised query.  This is because there is overhead in distributing and regathering the sub tasks.

There is an argument that data warehouses have very few users so that Parallel Query is an ideal performance option.  However, I see modern data warehouses with large numbers of users simultaneously submitting queries (dashboards, web pages, reports etc), and you typically don't want one user to selfishly hog critical resources such as IO.
I might sound like I am against Parallel Query.  That isn't true.  It is a fantastic feature.  However, I would encourage DBAs to consider setting Max Degree of Parallelism to 1.  Note; that is not disabling Parallel Query, it is just the default.  So, by default, queries will not be selfish.  If you have a particular ETL task or report query that you want to parallelise, you can simple add OPTION (MAXDOP n) to the query, where n is the number of potential sub tasks running in parallel.  So you are not turning it off, you are just ensuring that queries aren't selfish by default.  Also, just to be clear, with Max Degree of Parallelism set to 1, your system can still have many SQL queries running in parallel (with each other) it is only that individual queries will not be broken into parallel running subtasks.

The reason I configured Max Degree of Parallelism to 1 the first few times was because I was working in environments with a wayward parallelised queries.  SQL Optimiser chose a very very poor plan thinking that parallelisation would make this plan fast.  The same queries with Max Degree of Parallelism =1 (or MAXDOP 1) the optimiser chose a very different, and efficient plan.  When I reduced the "Max Degree of Parallelism" to 1, these queries, and the whole system ran more efficiently.

An annoying limitation of MAXDOP is that you are unable to set MAXDOP for queries that SQL Server Analysis Services uses when processing fact tables.  You can only provide SSAS with a table or view name, and it constructs the query.  There is no way to inject a MAXDOP option.  This is another reason I like to set Max Degree of Parallelism = 1, so that the default setting is non parallelised.

One word of warning.  Changing Max Degree of Parallelism to 1, can (in fact by design it should) make some queries run slower.  It can also change dramatically the query plan (as I discussed above) and while this will usually result in more efficient queries, it can also involve a poorer performing plan.  In my experience the reverse is more likely (when there are considerably different plans between parallelised and non parallelised, the non parallelised plan is more often the more efficient.  Consequently, it is less risky to configure Max Degree of Parallelism =1 while developing a solution (and using MAXDOP n for individual queries where appropriate) than to change a production environment.

The effect of Max Degree of Parallelism is not just on the individual query.  If you want to know the impact of Max Degree of Parallelism, you should monitor vital statistics (cpu, memory, IO), get average query response times (for common queries) and compare them with before and after  configuration changes.  Another, albeit small, cost of Max Degree of Parallelism<>1 is that each parallelised query will now have two plans (it always keeps a non parallel version for when cpu is high) so that more pressure is put on the plan cache (it is bigger, or it holds fewer plans).  Note, SQL Server is great in that you can change the Max Degree of Parallelism option dynamically.  No SQL Server restart required.  You might even consider setting Max Degree of Parallelism =1 for the online day and resetting Max Degree of Parallelism =0 for the overnight ETL.

A closely related issue is Resource Governor, which also has a MAXDOP option that looks like it can override the MAXDOP option in the query.  However, as I understand it, the Resource Governor does not work like MAXDOP in a query or sp_configure.  If you set MAXDOP to 1 in the Resource Governor, the underlying queries can still chose parallel plans (no change there).  What the Resource Governor does is to restrict the query to only one thread!   This is almost the worst of both worlds, as the optimiser can choose the selfish (potentially faster) parallel plan, but the Resource Governor will ensure that the parallel tasks are serialised with just one thread.

No comments: