Table Editing in R is Easy! Here Are a Few Tricks…

When you use a data analysis program like R or SAS, you often have to do some data editing. It can be difficult because the software was intended for calculations, not transformation.

Table editing in R cannot be adequately covered in one blog post, but I thought I’d show you a few tricks I use, specifically for making tables with summary values in them.

In this demonstration, we will be using a subset of BRFSS data, and we will making a table with summary values in it. First we will make a blank table with just the headings across the top and the row titles along the y-axis. Then, we will fill in all the cells in the table with the correct values from the BRFSS data. So table editing in R is done through making a base table and then editing it.

To download the dataset we will be using, go to this folder on GitHub, and download the dataset named BRFSS_i.rds (last one in the folder). Let’s import that dataframe into R GUI. You can get the code I’m using here.

Table Editing in R Prep: Importing Source Dataframe

If you open R GUI and map the console to the directory where you put the dataframe BRFSS_i.rds, you should be able to run the following code and import it into R.

BRFSS <- readRDS(file="BRFSS_i.rds")
colnames(BRFSS)
nrow(BRFSS)

Once in R, the dataframe is called BRFSS, and the colnames command will show you the different columns. The nrow command will show you there are a little over 58,000 rows.

Preparation for Table Editing in R

Let’s pretend we are comparing people of different smoking statuses. I used the native BRFSS variables to create a new variable called SMOKGRP, which has the values of 1 = Current smoker, 2 = Current non-smoker, 9 = Unknown. Let’s see the frequencies in that variable.

CODE:
table(BRFSS$SMOKGRP)

OUTPUT:
    1     2     9 
 8571 49261   299

These three statuses will go across the x-axis of my table. I also want a column called “All” to include for the entire dataset.

Because this is just a quick demonstration, we will only add summary values in two rows – one for educational group (EDGROUP), and one for Hispanic status (HISPANIC = 1). Again, these are variables I created from native BRFSS variables. Let’s look at their frequencies.

CODE:
table(BRFSS$EDGROUP)
table(BRFSS$HISPANIC)

OUTPUT:
    1     2     3     4     9 
 2483 16241 17559 21742   106
    0     1 
55869  2262

For EDGROUP, 1 = Less than high school, 2 = High school graduate, 3 = Some college, 4 = College degree, and 9 = Unknown. For these, we will want one row for each level. But for HISPANIC, we will just report the value for “Yes”, which is coded as 1.

Different Methods of Table Editing in R

There are so many ways to make a table from scratch in R! I’m just demonstrating one way here. In the way I’m showing here, I first make a vector for each column in my table, and then I “bind” them together like macrame and it’s a table! Just about any way you do it, you usually end up doing a little table editing in R by the end.

In this video, I show you a different way to do it than we are discussing on this blog post, and that is how to make a matrix in R first, and then convert it to a table.

Making the Base Vectors

So next, we will make a vector for each of our columns. We will have five columns for our summary table.

  • CatLev: This stands for category/level. This is a short character string to remind me what is on the row.
  • All: This column will hold the frequencies in the dataset for each category level.
  • CurSmok: This column will hold the frequencies just for the current smokers.
  • NSmok: This will hold the frequencies for the non-smokers.
  • UnkSmok: This will hold the frequencies for those with unknown smoking status.

Let’s make the first vector called CatLev.

CatLev <- c("All", "Ed-LTHS", "Ed-HS", "Ed-SOMECOLL", "Ed-GRAD", "Ed-Unk",
	"Hispanic-Yes")

See how each member of the vector is a shorthand character string for each row I will need in the table? The top row will have summary frequencies for the entire dataset (“All”), and then we have each level of EDGROUP, and then we have a place for the frequency of those who identified as Hispanic.

To get the length of this vector, you could run the command length(CatLev). We use this feature of R to create the other three vectors which are entirely filled with 0’s using the rep commend in anticipation of being updated with frequencies in subsequent programming.

CurSmok <- rep(0, length(CatLev))
NSmok <- rep(0, length(CatLev))
UnkSmok <- rep(0, length(CatLev))

The rep command asks for two arguments: what to repeat, and how many times to repeat it. Each time, we tell it to repeat a 0, and we tell it to do as many times as there are members of CatLev. Since there are 7 members of CatLev, CurSmok, NSmok, and UnkSmok all are just vectors with 7 0’s in them after this step.

Binding the Columns Together

Next, we are going to bind the columns together into a dataframe using a data.frame command. It was important what we named our vectors, because those names become the name of our columns. We are creating a dataframe called tbl.

tbl <- data.frame(CatLev, All, CurSmok, NSmok, UnkSmok)

Let’s run tbl and look at it so far.

        CatLev All CurSmok NSmok UnkSmok
1          All   0       0     0       0
2      Ed-LTHS   0       0     0       0
3        Ed-HS   0       0     0       0
4  Ed-SOMECOLL   0       0     0       0
5      Ed-GRAD   0       0     0       0
6       Ed-Unk   0       0     0       0
7 Hispanic-Yes   0       0     0       0

As you can see, it’s empty and ready for us to replace the 0’s with frequencies.

Referring to Cells in Dataframes in R

Let’s first go over how to refer to rows, columns, and cells in dataframes in R by number. In the output above, we have 7 rows which are numbered. We see our 5 columns we made. The name of this table is tbl. So, if I want to refer to a particular cell in this table, I state the name of the table, then in brackets, I put the row number, a comma, and a column number.

Let’s say I was referring to the last cell in the table – which would be the frequency of Hispanics with unknown smoking status. That cell would be referred to in programming as tbl[7,5], because that cell is in the dataframe called tbl, and is on the 7th row, and in the 5th column.

Imagine I wanted to refer to the entire Hispanic-Yes row. I could do that by saying tbl[7,], because this indicates all members of the 7th row. Or, I could refer to the entire UnkSmok column by saying tbl[,5].

Filling in the Top Row

Let’s look at cell tbl[1,2]. This presumably will have the total n in the dataset. We can store that as a value in a variable called total_n this way.

total_n <- nrow(BRFSS)

Now, we can use the variable total_n in any calculations we want (e.g., denominators of percentages). I won’t demonstrate that in this tutorial – but I will show you how to populate our cell tbl[1,2] with that value.

tbl[1,2] <- total_n

Of course, we could have done this in one command by just saying tbl[1,2] <- nrow(BRFSS), but I wanted to demonstrate how you can save a value in a variable and then reuse the variable if you want.

We still have three more values to fill in on the top row. It’s the frequencies for the smoking statuses in the dataframe. To do that, I start by using the table command to make a one-way frequency of SMOKGRP – but I wrap this in a data.frame command so it structures the results like a dataframe, not a matrix. Also, I save it as an object called smok_freqs.

smok_freqs <- as.data.frame(table(BRFSS$SMOKGRP))

You already know these frequencies because I showed you them before. However, when you structure the results like a dataframe, they look and feel very different. Here is what smok_freqs looks like.

  Var1  Freq
1    1  8571
2    2 49261
3    9   299

Notice this is a dataframe, so the first column is literally named Var1 and the second is named Freq. You can query this table like any dataframe in R. But all we want it for is to copy values from it into our table named tbl.

You have probably noticed that the numbers we want are under the Freq column. We basically want those three numbers to appear in row 1, columns 3 through 5 in tbl. Here is how we do this.

tbl[1,3:5] <- smok_freqs[1:3,2]

As you can see in the code, we are specifying row 1 and columns 3:5 for tbl on the left side of the arrow. On the right side, we are telling R to take the values from rows 1 through 3 in column 2 from the table smok_freqs, and put them there.

Here is what tbl looks like after this step.

        CatLev   All CurSmok NSmok UnkSmok
1          All 58131    8571 49261     299
2      Ed-LTHS     0       0     0       0
3        Ed-HS     0       0     0       0
4  Ed-SOMECOLL     0       0     0       0
5      Ed-GRAD     0       0     0       0
6       Ed-Unk     0       0     0       0
7 Hispanic-Yes     0       0     0       0

Filling in the Grouping Variable Frequencies

Now let’s tackle the grouping variable, EDGROUP. We can make a one-way frequency table called ed_freqs to store our values using the as.data.frame trick I showed you above.

ed_freqs <- as.data.frame(table(BRFSS$EDGROUP))

Then we can use the values from that to populate tbl. As it turns out, the entire second column of ed_freqs contains the values we want. In tbl, those values should land in the second to sixth rows in the second column, so we use this coding.

tbl[2:6,2] <- ed_freqs[,2]

Now, we have to fill in the frequencies for the crosstabs between SMOKGRP and EDGROUP. Here is an approach.

ed_smok_freqs <- as.data.frame(table(BRFSS$EDGROUP, BRFSS$SMOKGRP))
tbl[2:6,3] <- ed_smok_freqs[1:5,3]
tbl[2:6,4] <- ed_smok_freqs[6:10,3]
tbl[2:6,5] <- ed_smok_freqs[11:15,3]

In this code, we create a two-way frequency table between EDGROUP and SMOKGRP which we name ed_smok_freqs. If you do this and run the table ed_smok_freqs, you will see that it is a long, skinny table with these columns: Var1, Var2, and Freqs. Since we placed EDGROUP as the first variable in our two-way frequency, Var1 is the value of EDGROUP, and Var2 is the value of SMOKGRP.

You will see the code above I used to populate tbl using ed_smok_freqs. I’ll show you what ed_smok_freqs looks like below, and then you will probably be able to see what I’m doing in the code above.

   Var1 Var2  Freq
1     1    1   556
2     2    1  3134
3     3    1  3148
4     4    1  1723
5     9    1    10
6     1    2  1919
7     2    2 13034
8     3    2 14317
9     4    2 19902
10    9    2    89
11    1    9     8
12    2    9    73
13    3    9    94
14    4    9   117
15    9    9     7

Filling in the Indicator Variable Frequencies

Filling in the row for Hispanic-Yes is really no different than what we have been doing except it leaves out a whole level (i.e., we don’t report the frequencies of those who said anything but Hispanic-Yes). We start by doing the one-way frequency for all and saving the values in hisp_freqs, and adding that to value to the “all” column in tbl.

hisp_freqs <- as.data.frame(table(BRFSS$HISPANIC))
tbl[7,2] <- hisp_freqs[2,2]

But here is where we do something a little different. Making the two-way frequency table dataframe named hisp_smok_freq using HISPANIC and SMOKGRP should be familiar to you because we did it above with EDGROUP and SMOKGRP. However, what is different is that there are three commands after that to cherry-pick the “yes” frequencies out of hisp_smok_freq and put them in the correct cells in the table.

hisp_smok_freqs <- as.data.frame(table(BRFSS$HISPANIC, BRFSS$SMOKGRP))
tbl[7,3] <- hisp_smok_freqs[2,3]
tbl[7,4] <- hisp_smok_freqs[4,3]
tbl[7,5] <- hisp_smok_freqs[6,3]

As you can see, we have to cherry-pick the value for current smokers who are Hispanics out of hisp_smok_freqs[2,3] and put it in tbl[7,3]. Then, we have to skip a row in hisp_smok_freqs and cherry-pick out row 4, column 3 to put in tbl[7,4] for the non-smoking Hispanics. What’s going on in the code becomes clearer when you look at what hisp_smok_freqs looks like.

  Var1 Var2  Freq
1    0    1  8207
2    1    1   364
3    0    2 47374
4    1    2  1887
5    0    9   288
6    1    9    11

At this point, the entire tbl should be filled in. Here’s what it looks like.

        CatLev   All CurSmok NSmok UnkSmok
1          All 58131    8571 49261     299
2      Ed-LTHS  2483     556  1919       8
3        Ed-HS 16241    3134 13034      73
4  Ed-SOMECOLL 17559    3148 14317      94
5      Ed-GRAD 21742    1723 19902     117
6       Ed-Unk   106      10    89       7
7 Hispanic-Yes  2262     364  1887      11

Exporting the Dataframe

Here is our last step to table editing in R. What I normally do at this point is rename this table with a real name (e.g., demo_tbl for demographic table) and export this into a *.csv of the same name. Then, I literally copy and paste these values in to a real MS Excel spreadsheet that has a table that I’m creating as part of a scientific publication. This is the only table editing in R I do; all I needed from R are these frequencies. Using programming in Excel, I can calculate any proportion I want, and display the cells however I want. It’s better to prepare tables for publication in Excel where you easily fuss with the appearance rather than R. However, if you are making figures for peer-reviewed articles, I strongly recommend R.

So here are my last two steps – renaming tbl to some real name, then exporting it as a *.csv. This *.csv will land in the same folder as you mapped the R GUI console to (to import the BRFSS_i.rds dataframe).

demo_tbl <- tbl
write.csv(demo_tbl, "demo_tbl.csv")

Updated August 19, 2023. Added video October 15, 2023.

Read all of our data science blog posts!

Descriptive Analysis of Black Friday Death Count Database: Creative Classification

Descriptive analysis of Black Friday Death Count Database provides an example of how creative classification [...]

Classification Crosswalks: Strategies in Data Transformation

Classification crosswalks are easy to make, and can help you reduce cardinality in categorical variables, [...]

FAERS Data: Getting Creative with an Adverse Event Surveillance Dashboard

FAERS data are like any post-market surveillance pharmacy data – notoriously messy. But if you [...]

Dataset Source Documentation: Necessary for Data Science Projects with Multiple Data Sources

Dataset source documentation is good to keep when you are doing an analysis with data [...]

Joins in Base R: Alternative to SQL-like dplyr

Joins in base R must be executed properly or you will lose data. Read my [...]

NHANES Data: Pitfalls, Pranks, Possibilities, and Practical Advice

NHANES data piqued your interest? It’s not all sunshine and roses. Read my blog post [...]

Color in Visualizations: Using it to its Full Communicative Advantage

Color in visualizations of data curation and other data science documentation can be used to [...]

Defaults in PowerPoint: Setting Them Up for Data Visualizations

Defaults in PowerPoint are set up for slides – not data visualizations. Read my blog [...]

Text and Arrows in Dataviz Can Greatly Improve Understanding

Text and arrows in dataviz, if used wisely, can help your audience understand something very [...]

Shapes and Images in Dataviz: Making Choices for Optimal Communication

Shapes and images in dataviz, if chosen wisely, can greatly enhance the communicative value of [...]

Table Editing in R is Easy! Here Are a Few Tricks…

Table editing in R is easier than in SAS, because you can refer to columns, [...]

R for Logistic Regression: Example from Epidemiology and Biostatistics

R for logistic regression in health data analytics is a reasonable choice, if you know [...]

1 Comments

Connecting SAS to Other Applications: Different Strategies

Connecting SAS to other applications is often necessary, and there are many ways to do [...]

Portfolio Project Examples for Independent Data Science Projects

Portfolio project examples are sometimes needed for newbies in data science who are looking to [...]

Project Management Terminology for Public Health Data Scientists

Project management terminology is often used around epidemiologists, biostatisticians, and health data scientists, and it’s [...]

Rapid Application Development Public Health Style

“Rapid application development” (RAD) refers to an approach to designing and developing computer applications. In [...]

Understanding Legacy Data in a Relational World

Understanding legacy data is necessary if you want to analyze datasets that are extracted from [...]

Front-end Decisions Impact Back-end Data (and Your Data Science Experience!)

Front-end decisions are made when applications are designed. They are even made when you design [...]

Reducing Query Cost (and Making Better Use of Your Time)

Reducing query cost is especially important in SAS – but do you know how to [...]

Curated Datasets: Great for Data Science Portfolio Projects!

Curated datasets are useful to know about if you want to do a data science [...]

Statistics Trivia for Data Scientists

Statistics trivia for data scientists will refresh your memory from the courses you’ve taken – [...]

Management Tips for Data Scientists

Management tips for data scientists can be used by anyone – at work and in [...]

REDCap Mess: How it Got There, and How to Clean it Up

REDCap mess happens often in research shops, and it’s an analysis showstopper! Read my blog [...]

GitHub Beginners in Data Science: Here’s an Easy Way to Start!

GitHub beginners – even in data science – often feel intimidated when starting their GitHub [...]

ETL Pipeline Documentation: Here are my Tips and Tricks!

ETL pipeline documentation is great for team communication as well as data stewardship! Read my [...]

Benchmarking Runtime is Different in SAS Compared to Other Programs

Benchmarking runtime is different in SAS compared to other programs, where you have to request [...]

End-to-End AI Pipelines: Can Academics Be Taught How to Do Them?

End-to-end AI pipelines are being created routinely in industry, and one complaint is that academics [...]

Referring to Columns in R by Name Rather than Number has Pros and Cons

Referring to columns in R can be done using both number and field name syntax. [...]

The Paste Command in R is Great for Labels on Plots and Reports

The paste command in R is used to concatenate strings. You can leverage the paste [...]

Coloring Plots in R using Hexadecimal Codes Makes Them Fabulous!

Recoloring plots in R? Want to learn how to use an image to inspire R [...]

Adding Error Bars to ggplot2 Plots Can be Made Easy Through Dataframe Structure

Adding error bars to ggplot2 in R plots is easiest if you include the width [...]

AI on the Edge: What it is, and Data Storage Challenges it Poses

“AI on the edge” was a new term for me that I learned from Marc [...]

Pie Chart ggplot Style is Surprisingly Hard! Here’s How I Did it

Pie chart ggplot style is surprisingly hard to make, mainly because ggplot2 did not give [...]

Time Series Plots in R Using ggplot2 Are Ultimately Customizable

Time series plots in R are totally customizable using the ggplot2 package, and can come [...]

Data Curation Solution to Confusing Options in R Package UpSetR

Data curation solution that I posted recently with my blog post showing how to do [...]

Making Upset Plots with R Package UpSetR Helps Visualize Patterns of Attributes

Making upset plots with R package UpSetR is an easy way to visualize patterns of [...]

4 Comments

Making Box Plots Different Ways is Easy in R!

Making box plots in R affords you many different approaches and features. My blog post [...]

Convert CSV to RDS When Using R for Easier Data Handling

Convert CSV to RDS is what you want to do if you are working with [...]

GPower Case Example Shows How to Calculate and Document Sample Size

GPower case example shows a use-case where we needed to select an outcome measure for [...]

Querying the GHDx Database: Demonstration and Review of Application

Querying the GHDx database is challenging because of its difficult user interface, but mastering it [...]

Variable Names in SAS and R Have Different Restrictions and Rules

Variable names in SAS and R are subject to different “rules and regulations”, and these [...]

Referring to Variables in Processing Data is Different in SAS Compared to R

Referring to variables in processing is different conceptually when thinking about SAS compared to R. [...]

Counting Rows in SAS and R Use Totally Different Strategies

Counting rows in SAS and R is approached differently, because the two programs process data [...]

Native Formats in SAS and R for Data Are Different: Here’s How!

Native formats in SAS and R of data objects have different qualities – and there [...]

SAS-R Integration Example: Transform in R, Analyze in SAS!

Looking for a SAS-R integration example that uses the best of both worlds? I show [...]

Dumbbell Plot for Comparison of Rated Items: Which is Rated More Highly – Harvard or the U of MN?

Want to compare multiple rankings on two competing items – like hotels, restaurants, or colleges? [...]

2 Comments

Data for Meta-analysis Need to be Prepared a Certain Way – Here’s How

Getting data for meta-analysis together can be challenging, so I walk you through the simple [...]

Sort Order, Formats, and Operators: A Tour of The SAS Documentation Page

Get to know three of my favorite SAS documentation pages: the one with sort order, [...]

Confused when Downloading BRFSS Data? Here is a Guide

I use the datasets from the Behavioral Risk Factor Surveillance Survey (BRFSS) to demonstrate in [...]

2 Comments

Doing Surveys? Try my R Likert Plot Data Hack!

I love the Likert package in R, and use it often to visualize data. The [...]

2 Comments

I Used the R Package EpiCurve to Make an Epidemiologic Curve. Here’s How It Turned Out.

With all this talk about “flattening the curve” of the coronavirus, I thought I would [...]

Which Independent Variables Belong in a Regression Equation? We Don’t All Agree, But Here’s What I Do.

During my failed attempt to get a PhD from the University of South Florida, my [...]

Table editing in R is easier than in SAS, because you can refer to columns, rows, and individual cells in the same way you do in MS Excel. Read my blog post for example R table editing code.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Verified by MonsterInsights