Table Editing in R is Easy! Here Are a Few Tricks…

When you use a data analysis program like R or SAS, you often have to do some data editing. It can be difficult because the software was intended for calculations, not transformation.

Table editing in R cannot be adequately covered in one blog post, but I thought I’d show you a few tricks I use, specifically for making tables with summary values in them.

In this demonstration, we will be using a subset of BRFSS data, and we will making a table with summary values in it. First we will make a blank table with just the headings across the top and the row titles along the y-axis. Then, we will fill in all the cells in the table with the correct values from the BRFSS data. So table editing in R is done through making a base table and then editing it.

To download the dataset we will be using, go to this folder on GitHub, and download the dataset named BRFSS_i.rds (last one in the folder). Let’s import that dataframe into R GUI. You can get the code I’m using here.

Table Editing in R Prep: Importing Source Dataframe

If you open R GUI and map the console to the directory where you put the dataframe BRFSS_i.rds, you should be able to run the following code and import it into R.

BRFSS <- readRDS(file="BRFSS_i.rds")
colnames(BRFSS)
nrow(BRFSS)

Once in R, the dataframe is called BRFSS, and the colnames command will show you the different columns. The nrow command will show you there are a little over 58,000 rows.

Preparation for Table Editing in R

Let’s pretend we are comparing people of different smoking statuses. I used the native BRFSS variables to create a new variable called SMOKGRP, which has the values of 1 = Current smoker, 2 = Current non-smoker, 9 = Unknown. Let’s see the frequencies in that variable.

CODE:
table(BRFSS$SMOKGRP)

OUTPUT:
    1     2     9 
 8571 49261   299

These three statuses will go across the x-axis of my table. I also want a column called “All” to include for the entire dataset.

Because this is just a quick demonstration, we will only add summary values in two rows – one for educational group (EDGROUP), and one for Hispanic status (HISPANIC = 1). Again, these are variables I created from native BRFSS variables. Let’s look at their frequencies.

CODE:
table(BRFSS$EDGROUP)
table(BRFSS$HISPANIC)

OUTPUT:
    1     2     3     4     9 
 2483 16241 17559 21742   106
    0     1 
55869  2262

For EDGROUP, 1 = Less than high school, 2 = High school graduate, 3 = Some college, 4 = College degree, and 9 = Unknown. For these, we will want one row for each level. But for HISPANIC, we will just report the value for “Yes”, which is coded as 1.

Different Methods of Table Editing in R

There are so many ways to make a table from scratch in R! I’m just demonstrating one way here. In the way I’m showing here, I first make a vector for each column in my table, and then I “bind” them together like macrame and it’s a table! Just about any way you do it, you usually end up doing a little table editing in R by the end.

In this video, I show you a different way to do it than we are discussing on this blog post, and that is how to make a matrix in R first, and then convert it to a table.

Making the Base Vectors

So next, we will make a vector for each of our columns. We will have five columns for our summary table.

  • CatLev: This stands for category/level. This is a short character string to remind me what is on the row.
  • All: This column will hold the frequencies in the dataset for each category level.
  • CurSmok: This column will hold the frequencies just for the current smokers.
  • NSmok: This will hold the frequencies for the non-smokers.
  • UnkSmok: This will hold the frequencies for those with unknown smoking status.

Let’s make the first vector called CatLev.

CatLev <- c("All", "Ed-LTHS", "Ed-HS", "Ed-SOMECOLL", "Ed-GRAD", "Ed-Unk",
	"Hispanic-Yes")

See how each member of the vector is a shorthand character string for each row I will need in the table? The top row will have summary frequencies for the entire dataset (“All”), and then we have each level of EDGROUP, and then we have a place for the frequency of those who identified as Hispanic.

To get the length of this vector, you could run the command length(CatLev). We use this feature of R to create the other three vectors which are entirely filled with 0’s using the rep commend in anticipation of being updated with frequencies in subsequent programming.

CurSmok <- rep(0, length(CatLev))
NSmok <- rep(0, length(CatLev))
UnkSmok <- rep(0, length(CatLev))

The rep command asks for two arguments: what to repeat, and how many times to repeat it. Each time, we tell it to repeat a 0, and we tell it to do as many times as there are members of CatLev. Since there are 7 members of CatLev, CurSmok, NSmok, and UnkSmok all are just vectors with 7 0’s in them after this step.

Binding the Columns Together

Next, we are going to bind the columns together into a dataframe using a data.frame command. It was important what we named our vectors, because those names become the name of our columns. We are creating a dataframe called tbl.

tbl <- data.frame(CatLev, All, CurSmok, NSmok, UnkSmok)

Let’s run tbl and look at it so far.

        CatLev All CurSmok NSmok UnkSmok
1          All   0       0     0       0
2      Ed-LTHS   0       0     0       0
3        Ed-HS   0       0     0       0
4  Ed-SOMECOLL   0       0     0       0
5      Ed-GRAD   0       0     0       0
6       Ed-Unk   0       0     0       0
7 Hispanic-Yes   0       0     0       0

As you can see, it’s empty and ready for us to replace the 0’s with frequencies.

Referring to Cells in Dataframes in R

Let’s first go over how to refer to rows, columns, and cells in dataframes in R by number. In the output above, we have 7 rows which are numbered. We see our 5 columns we made. The name of this table is tbl. So, if I want to refer to a particular cell in this table, I state the name of the table, then in brackets, I put the row number, a comma, and a column number.

Let’s say I was referring to the last cell in the table – which would be the frequency of Hispanics with unknown smoking status. That cell would be referred to in programming as tbl[7,5], because that cell is in the dataframe called tbl, and is on the 7th row, and in the 5th column.

Imagine I wanted to refer to the entire Hispanic-Yes row. I could do that by saying tbl[7,], because this indicates all members of the 7th row. Or, I could refer to the entire UnkSmok column by saying tbl[,5].

Filling in the Top Row

Let’s look at cell tbl[1,2]. This presumably will have the total n in the dataset. We can store that as a value in a variable called total_n this way.

total_n <- nrow(BRFSS)

Now, we can use the variable total_n in any calculations we want (e.g., denominators of percentages). I won’t demonstrate that in this tutorial – but I will show you how to populate our cell tbl[1,2] with that value.

tbl[1,2] <- total_n

Of course, we could have done this in one command by just saying tbl[1,2] <- nrow(BRFSS), but I wanted to demonstrate how you can save a value in a variable and then reuse the variable if you want.

We still have three more values to fill in on the top row. It’s the frequencies for the smoking statuses in the dataframe. To do that, I start by using the table command to make a one-way frequency of SMOKGRP – but I wrap this in a data.frame command so it structures the results like a dataframe, not a matrix. Also, I save it as an object called smok_freqs.

smok_freqs <- as.data.frame(table(BRFSS$SMOKGRP))

You already know these frequencies because I showed you them before. However, when you structure the results like a dataframe, they look and feel very different. Here is what smok_freqs looks like.

  Var1  Freq
1    1  8571
2    2 49261
3    9   299

Notice this is a dataframe, so the first column is literally named Var1 and the second is named Freq. You can query this table like any dataframe in R. But all we want it for is to copy values from it into our table named tbl.

You have probably noticed that the numbers we want are under the Freq column. We basically want those three numbers to appear in row 1, columns 3 through 5 in tbl. Here is how we do this.

tbl[1,3:5] <- smok_freqs[1:3,2]

As you can see in the code, we are specifying row 1 and columns 3:5 for tbl on the left side of the arrow. On the right side, we are telling R to take the values from rows 1 through 3 in column 2 from the table smok_freqs, and put them there.

Here is what tbl looks like after this step.

        CatLev   All CurSmok NSmok UnkSmok
1          All 58131    8571 49261     299
2      Ed-LTHS     0       0     0       0
3        Ed-HS     0       0     0       0
4  Ed-SOMECOLL     0       0     0       0
5      Ed-GRAD     0       0     0       0
6       Ed-Unk     0       0     0       0
7 Hispanic-Yes     0       0     0       0

Filling in the Grouping Variable Frequencies

Now let’s tackle the grouping variable, EDGROUP. We can make a one-way frequency table called ed_freqs to store our values using the as.data.frame trick I showed you above.

ed_freqs <- as.data.frame(table(BRFSS$EDGROUP))

Then we can use the values from that to populate tbl. As it turns out, the entire second column of ed_freqs contains the values we want. In tbl, those values should land in the second to sixth rows in the second column, so we use this coding.

tbl[2:6,2] <- ed_freqs[,2]

Now, we have to fill in the frequencies for the crosstabs between SMOKGRP and EDGROUP. Here is an approach.

ed_smok_freqs <- as.data.frame(table(BRFSS$EDGROUP, BRFSS$SMOKGRP))
tbl[2:6,3] <- ed_smok_freqs[1:5,3]
tbl[2:6,4] <- ed_smok_freqs[6:10,3]
tbl[2:6,5] <- ed_smok_freqs[11:15,3]

In this code, we create a two-way frequency table between EDGROUP and SMOKGRP which we name ed_smok_freqs. If you do this and run the table ed_smok_freqs, you will see that it is a long, skinny table with these columns: Var1, Var2, and Freqs. Since we placed EDGROUP as the first variable in our two-way frequency, Var1 is the value of EDGROUP, and Var2 is the value of SMOKGRP.

You will see the code above I used to populate tbl using ed_smok_freqs. I’ll show you what ed_smok_freqs looks like below, and then you will probably be able to see what I’m doing in the code above.

   Var1 Var2  Freq
1     1    1   556
2     2    1  3134
3     3    1  3148
4     4    1  1723
5     9    1    10
6     1    2  1919
7     2    2 13034
8     3    2 14317
9     4    2 19902
10    9    2    89
11    1    9     8
12    2    9    73
13    3    9    94
14    4    9   117
15    9    9     7

Filling in the Indicator Variable Frequencies

Filling in the row for Hispanic-Yes is really no different than what we have been doing except it leaves out a whole level (i.e., we don’t report the frequencies of those who said anything but Hispanic-Yes). We start by doing the one-way frequency for all and saving the values in hisp_freqs, and adding that to value to the “all” column in tbl.

hisp_freqs <- as.data.frame(table(BRFSS$HISPANIC))
tbl[7,2] <- hisp_freqs[2,2]

But here is where we do something a little different. Making the two-way frequency table dataframe named hisp_smok_freq using HISPANIC and SMOKGRP should be familiar to you because we did it above with EDGROUP and SMOKGRP. However, what is different is that there are three commands after that to cherry-pick the “yes” frequencies out of hisp_smok_freq and put them in the correct cells in the table.

hisp_smok_freqs <- as.data.frame(table(BRFSS$HISPANIC, BRFSS$SMOKGRP))
tbl[7,3] <- hisp_smok_freqs[2,3]
tbl[7,4] <- hisp_smok_freqs[4,3]
tbl[7,5] <- hisp_smok_freqs[6,3]

As you can see, we have to cherry-pick the value for current smokers who are Hispanics out of hisp_smok_freqs[2,3] and put it in tbl[7,3]. Then, we have to skip a row in hisp_smok_freqs and cherry-pick out row 4, column 3 to put in tbl[7,4] for the non-smoking Hispanics. What’s going on in the code becomes clearer when you look at what hisp_smok_freqs looks like.

  Var1 Var2  Freq
1    0    1  8207
2    1    1   364
3    0    2 47374
4    1    2  1887
5    0    9   288
6    1    9    11

At this point, the entire tbl should be filled in. Here’s what it looks like.

        CatLev   All CurSmok NSmok UnkSmok
1          All 58131    8571 49261     299
2      Ed-LTHS  2483     556  1919       8
3        Ed-HS 16241    3134 13034      73
4  Ed-SOMECOLL 17559    3148 14317      94
5      Ed-GRAD 21742    1723 19902     117
6       Ed-Unk   106      10    89       7
7 Hispanic-Yes  2262     364  1887      11

Exporting the Dataframe

Here is our last step to table editing in R. What I normally do at this point is rename this table with a real name (e.g., demo_tbl for demographic table) and export this into a *.csv of the same name. Then, I literally copy and paste these values in to a real MS Excel spreadsheet that has a table that I’m creating as part of a scientific publication. This is the only table editing in R I do; all I needed from R are these frequencies. Using programming in Excel, I can calculate any proportion I want, and display the cells however I want. It’s better to prepare tables for publication in Excel where you easily fuss with the appearance rather than R. However, if you are making figures for peer-reviewed articles, I strongly recommend R.

So here are my last two steps – renaming tbl to some real name, then exporting it as a *.csv. This *.csv will land in the same folder as you mapped the R GUI console to (to import the BRFSS_i.rds dataframe).

demo_tbl <- tbl
write.csv(demo_tbl, "demo_tbl.csv")

Updated August 19, 2023. Added video October 15, 2023.

Read all of our data science blog posts!

Table editing in R is easier than in SAS, because you can refer to columns, rows, and individual cells in the same way you do in MS Excel. Read my blog post for example R table editing code.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Verified by MonsterInsights