Variable names in SAS and R datasets – as you have probably noticed – often look different, even for the same datasets. I’ll give you an example.
Variable Names Have to Follow the Rules of the Software
If you take my courses on LinkedIn Learning in SAS and R, you will see that I use the same dataset each time – the BRFSS dataset. One of the variables is named _STATE in the native dataset, which is served up in SAS.
In the case where you have variables with special characters in the names – like underscore – is where you see the differences between variable names in SAS and R. When I read the BRFSS dataset into R, the name of that variable changes to X_STATE. This is because R has a rule that you can’t start a variable name with an underscore. And why the underscore is there in the first place has to do with the sort order of PROC SORT in SAS, which I cover in Chapter 3 of my book, “Mastering SAS Programming for Data Warehousing”.
Variable Names Have Different Restrictions in SAS and R
You might think from my example that rules for naming variables are looser in SAS than R, because SAS allows you to name a variable starting with an underscore, and R does not. Actually, I would suggest it is the opposite. Here are some situations I’ve noticed where SAS has restrictions and R doesn’t:
- R variable names can be longer than SAS variable names
- R variable names can have spaces in them, and SAS variable names cannot have spaces in them
- R dataframes do not need column headings, so that means you can effectively have no variable names in R – just column numbers. In SAS, you need to have an actual name for each variable or column in the dataset.
This video explains the situation in R, where in a dataframe, variables can be referred to by their names as well as by their column numbers.
Variable Names in SAS are More Restrictive than in R
It’s important to remember that SAS has been around since the early 1970s, and therefore, SAS datasets have been around since then. Early datasets had to follow very tight naming restrictions, which is why if your datasets that started in this era – like the BRFSS – you will often find that variables have 8-character names that were in all-caps.
Now those restrictions have relaxed a little, in that you can have uppercase and lowercase characters in a name, and you can have variable names longer than 8 characters. But there are still a lot of rules:
- You can use uppercase and lowercase letters to name the variables – but SAS will process them as all uppercase. Therefore, the variable FName and FNAME are essentially the same variable, which is not true in R.
- The first character of the name of a SAS variable must be a letter or an underscore – but not a digit (meaning 0, 1, 2, etc.).
- The underscore is the only special character that can be in a SAS variable name – and unlike R, SAS variable names still cannot contain blanks.
This video shows you how to figure out the names of the variables in a SAS dataset.
As you can see, variable names in SAS and R can be very different. If you are running a data warehouse or other large data operation, and you are handling data from many different providers, you might also be serving up data to different types of users. In that case, you will want to be very careful about choosing variable names for the variables in your data warehouse, because SAS programmers will be used to certain naming conventions, and R users will be used to others.
Updated January 5, 2022.
Read all of our data science blog posts!
Variable names in SAS and R are subject to different “rules and regulations”, and these can be leveraged to your advantage, as I describe in this blog post.