Category Archives: Data Science

Posts about data science topics.

Variable Names in SAS and R Have Different Restrictions and Rules

You need to come up with names of variables in SAS and in R, but they need to be compatible with both languages if you are running a data warehouse.

Variable names in SAS and R are subject to different “rules and regulations”, and these can be leveraged to your advantage, as I describe in this blog post.

Referring to Variables in Processing Data is Different in SAS Compared to R

When doing data processing, especially extract-transform-load (ETL) into a data warehouse, you might need to refer to the variables in your code, and it's done differently in SAS vs. R.

Referring to variables in processing is different conceptually when thinking about SAS compared to R. I explain the differences in my blog post.

Counting Rows in SAS and R Use Totally Different Strategies

If you are a data scientist working with large datasets, you need to learn the commands to count both columns and rows in the dataset, whether you are using SAS or R.

Counting rows in SAS and R is approached differently, because the two programs process data in different ways. Read my blog post where I describe both ways.

Native Formats in SAS and R for Data Are Different: Here’s How!

Why use particular data formats for different programming languages in statistics? Because the programs can then process the data faster and with more accuracy.

Native formats in SAS and R of data objects have different qualities – and there are reasons behind these differences. Learn about them in this blog post!

SAS-R Integration Example: Transform in R, Analyze in SAS!

You can use SAS and R together in one project. I show you how to develop an analytic dataset in R and put it in SAS ODA for analysis.

Looking for a SAS-R integration example that uses the best of both worlds? I show you a use-case where I was in a hurry, and did transformation in R with the analysis in SAS!

Dumbbell Plot for Comparison of Rated Items: Which is Rated More Highly – Harvard or the U of MN?

This is an example of a dumbbell plot from the ggalt package in R that you can also use in RStudio

Want to compare multiple rankings on two competing items – like hotels, restaurants, or colleges? I show you an example of using a dumbbell plot for comparison in R with the ggalt package for this exact use-case!

Data for Meta-analysis Need to be Prepared a Certain Way – Here’s How

This is the forrest plot resulting from analysis with open source statistical software R using package rmeta.

Getting data for meta-analysis together can be challenging, so I walk you through the simple steps I take, starting with the scientific literature, and ending with a gorgeous and evidence-based Forrest plot!

Sort Order, Formats, and Operators: A Tour of The SAS Documentation Page

SAS software sorting a to z or using arithmetic operators

Get to know three of my favorite SAS documentation pages: the one with sort order, the one that lists all the SAS formats, and the one that explains all the SAS operators and expressions!

Confused when Downloading BRFSS Data? Here is a Guide

You can download public data from health surveillance surveys. However, you have to know how to locate it on the web site.

I use the datasets from the Behavioral Risk Factor Surveillance Survey (BRFSS) to demonstrate in a lot of my data science tutorials. The BRFSS are free and available to the public – but they are kind of buried on the web site. This blog post serves as a “map” to help you find them!

Doing Surveys? Try my R Likert Plot Data Hack!

The Likert package in R can visualize categorical data.

I love the Likert package in R, and use it often to visualize data. The problem is that sometimes, I have sparse data, and this can cause problems with the package. This blog post shows you a workaround, and also, a way to format the final plot that I think looks really great!

Verified by MonsterInsights