Counting Rows in SAS and R Use Totally Different Strategies

If you are a data scientist working with large datasets, you need to learn the commands to count both columns and rows in the dataset, whether you are using SAS or R.

Counting rows in SAS and R is done often, because if you are doing extract-transform-load (ETL) on datasets, you are always trying to make sure you know how many rows are in your datasets. But counting rows in SAS and R is done very differently, using different approaches. This is largely because the two different programs handle datasets in different ways.

Counting Rows in SAS and R is Shaped by Their Environments

Both SAS and R have commands that count rows in datasets in their respective environments. But how the environments work is different in general ways.

  • In SAS, PROCS (that are often huge, and are essentially super complex macros) are designed that generate a certain standard set of formatted output. Options can be set when the PROC is called to change the format of the output from the standard.
  • In R, base functions tend to be very modular and sparse. This allows the programmer to put them together to make complex output.

Counting Rows in SAS with PROC CONTENTS

As I mentioned earlier, rows are typically counted in SAS using PROC CONTENTS. However, as you can see by the screen shot I took of some PROC CONTENTS output, in SAS, rows are referred to as “observations”.

This is an example of PROC CONTENTS output. Note that SAS calls number of rows "observations" on the report.

Counting Rows in a Dataframe in R

While R has many objects that may behave like datasets in some ways (such as matrices), data analysts tend to put datasets in dataframe objects in R. Each row of the dataframe is called a “row” (unlike in SAS, where it’s called an “observation,” as I mentioned above). In fact, R has a function called row.names that has to do with uniquely identifying the rows in a dataframe (although it is fraught with issues, as you can read about here).

Typically, rows in dataframes are counted by using the nrow() command in R, as demonstrated in the video.

The reason why you want to get good at counting rows in SAS and R is that every time you use a subset command in R to remove rows from a dataframe, or WHERE criteria in SAS on a column to subset a dataset, you want to make sure to count the rows in the resulting datasets to make sure you didn’t make an ETL mistake.

Counting rows in SAS and R is approached differently, because the two programs process data in different ways. Read my blog post where I describe both ways.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.