Category Archives: Data Science

Posts about data science topics.

Pie Chart ggplot Style is Surprisingly Hard! Here’s How I Did it

How do you make a pie chart in ggplot2 package in R? It's not that obvious

Pie chart ggplot style is surprisingly hard to make, mainly because ggplot2 did not give us a circle shape to deal with. But I explain how to get around it in my blog pot.

Time Series Plots in R Using ggplot2 Are Ultimately Customizable

Time series plots can be customized if you use package ggplot2 in R. You can place labels and configure axes.

Time series plots in R are totally customizable using the ggplot2 package, and can come out with a look that is clean and sharp. However, you usually end up fighting with formatting the x-axis and other options, and I explain in my blog post.

Data Curation Solution to Confusing Options in R Package UpSetR

It is possible to use data curation to solve the problem of a confusion vector containing options.

Data curation solution that I posted recently with my blog post showing how to do upset plots in R using the UpSetR package was itself kind of a masterpiece. Therefore, I thought I’d dedicate this blog post to explaining how and why I did it.

Making Upset Plots with R Package UpSetR Helps Visualize Patterns of Attributes

If you are having trouble setting options using R making plots, then you should read this blog post.

Making upset plots with R package UpSetR is an easy way to visualize patterns of attributes in your data. My blog post demonstrates making patterns of co-morbidities in health survey respondents from the BRFSS, and walks you through setting text and color options in the code.

Making Box Plots Different Ways is Easy in R!

There are two main ways to make box plots in R, and this blog post shows you how, and explains the differences.

Making box plots in R affords you many different approaches and features. My blog post will show you easy ways to use both base R and ggplot2 to make box plots as you are proceeding with your data science projects.

Convert CSV to RDS When Using R for Easier Data Handling

If you want to use R for a project and the source CSV is very big, it can improve input/output efficiency to convert the file to an RDS.

Convert CSV to RDS is what you want to do if you are working with big data files in R GUI and want to improve efficiency. Read my blog post for an explanation and video demonstrations of this process!

GPower Case Example Shows How to Calculate and Document Sample Size

This case example shows a use case where we estimated sample size in GPower under different conditions.

GPower case example shows a use-case where we needed to select an outcome measure for our study, then do a power calculation for sample size required under different outcome effect size scenarios. My blog post shows what I did, and how I documented/curated the results.

Querying the GHDx Database: Demonstration and Review of Application

Many data scientists interested in health are looking to query the Global Burden of Disease database, also known as the GHDx

Querying the GHDx database is challenging because of its difficult user interface, but mastering it will allow you to access country-level health data for comparisons! See my demonstration!

Variable Names in SAS and R Have Different Restrictions and Rules

You need to come up with names of variables in SAS and in R, but they need to be compatible with both languages if you are running a data warehouse.

Variable names in SAS and R are subject to different “rules and regulations”, and these can be leveraged to your advantage, as I describe in this blog post.

Referring to Variables in Processing Data is Different in SAS Compared to R

When doing data processing, especially extract-transform-load (ETL) into a data warehouse, you might need to refer to the variables in your code, and it's done differently in SAS vs. R.

Referring to variables in processing is different conceptually when thinking about SAS compared to R. I explain the differences in my blog post.