Referring to Variables in Processing Data is Different in SAS Compared to R

When doing data processing, especially extract-transform-load (ETL) into a data warehouse, you might need to refer to the variables in your code, and it's done differently in SAS vs. R.

Referring to variables in processing is something you always do in extract-transform-load (ETL) code, especially when you are automating it. How you do it is different in SAS compared to R.

Referring to Variables in Processing when Using SAS

SAS is relatively restricted in the way that you refer to variables in SAS datasets – and that is by the name of the variable. This is why variable names are so important in SAS. If you change a variable name, then the code that handles the variable won’t run because it will be referring to the old name.

This problem is solved in SAS by using macro variables, which are essentially aliases for variables. You can define a macro variable and set it to a value. That way, if you use processing that relies on something having a certain name you don’t want to ever change, you can have the processing use a macro variable. But you will need to budget in the overhead for loading the real variable into the macro variable. This video shows an introduction to macro variables in SAS.

Referring to Variables in R is More Flexible than in SAS

Referring to variables in processing in R has several options. Assuming we are talking about an R dataframe, we can use a variable name to refer to a column. But we can also use the column number, which is something not done in SAS very often.

Naming conventions in R and SAS are different, so therefore, variables are named differently. R variables can have spaces in the names, for example. That makes it hard to use R column names in programming, so often, programmers will prefer to use the column number. The video provides a demonstration.

Referring to Variables when Automating Processing in R and SAS

Referring to variables in processing when using automation is much easier to think about when considering both SAS and R. The way macro variables work in SAS is that they always take variable names from datasets (as well as other entities, such as arrays). Therefore, for automated processing, you are pretty much stuck in SAS with using macro variables and also macro processing.

But in R, it’s much less trouble to refer to columns in a dataframe by column number than by name. That’s because R has few restrictions on the format of column names in dataframes, so the names can get long and messy, and hard to refer to in code. On the other hand, referring to the column by number is easy in R, and it’s easy to automate in processing, because once a column number is selected in the automated process, there is no question about which column you mean.

Updated January 9, 2022.

Read all of our data science blog posts!

Referring to variables in processing is different conceptually when thinking about SAS compared to R. I explain the differences in my blog post.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.