I never thought about reducing query cost until I took a structured query language (SQL) course. I had been doing queries using database front-ends and back-ends, but I had never encountered the concept of “cost” specifically. I was aware of the fact that SAS – the statistical software – was originally built back in the early 1970s to reduce processor use, because they had to pay by the second!
I later revisited the concept of reducing query cost when it actually got expensive for my budget. I was managing a data lake at the US Army, and we had an excellent SAS consultant who was doing much of our data transformation. I noticed that she was playing solitaire a lot, and when I asked her about it, she exasperatedly explained she was “waiting for SAS”.
We were running PC SAS on her computer, but in reality, we needed the power of a server. She was excellent at reducing I/O with her data step programming – that was not the issue. We needed to think about reducing query cost another way. And, since we were paying her a high hourly fee just to play solitaire, we needed to consider reducing the cost of paying our consultants!
When Thinking About Reducing Query Cost – Time is Money!
It was when I was working at that data lake at the US Army that I learned how to study queries and see how much they cost. To be honest, I really did not know what I was doing at the beginning. I am formally trained in public health, so I had to take courses in informatics and business to learn what query cost actually was, and strategic approaches to reducing query cost.
My first step was learning a lot of terminology – such as “query cost” – which I did not know before. I found that it was very helpful to learn this new vocabulary from informatics and business, because some of it talked about phenomena we encounter in statistics, but don’t use a particular lexicon to describe.
I especially found the terms “front-end” and “back-end” to be useful when talking about data I was analyzing. It really helped me communicate with the various experts I had to talk to about improving I/O in my data lake so my SAS consultant did not have to play solitaire.
Strategies for Reducing Query Cost
My consultant was already employing the first strategy for reducing the cost of queries, and that is making well-formed code. I talk about how important this is in SAS in Chapter 2 of my book, “Mastering SAS Programming for Data Warehousing”. This is also important in SQL, but since SQL’s have optimizers, you can be a little sloppier with your code.
However, the bigger your data, the more I/O challenges you will encounter, so the first step for trying to reduce the cost of your queries just making sure you have well-formed code. Once you have optimal code and you still run into problems, you are going to have to think more of “environmental” solutions – meaning changing the environment of the data system to make it more amenable to the queries we were doing. In our case at the US Army, we really should have been using a SAS server – not PC SAS.
But in 2008 to 2011, when I worked there, no one would set up a new SAS server because it would have been too expensive. We even had access to an instance of Microsoft SQL we could use. So I wanted to know – would moving from PC SAS to MS SQL for our data storage result reduce our query cost?
That’s when I started doing the second thing you do when trying to reduce query cost , which is benchmark runtime. I had never heard of “benchmarking runtime” until I talked to engineers and programmers. I realized it wasn’t actually that hard, but how you do it is different in different programs.
Benchmarking runtime gave us the information to make our final decision about how to reconfigure our environment to reduce query cost. It’s a good thing we did – the consultant was absolutely sick of playing solitaire!
Updated June 10, 2023.
Reducing query cost is especially important in SAS – but do you know how to do it, or what it even means? Read my blog post to learn why this is important in health data analytics.