Tag Archives: data abstraction

Statistics Trivia for Data Scientists

Public health, artificial intelligence, and data science trivia! Fun! Educational! Test your knowledge!

Statistics trivia for data scientists will refresh your memory from the courses you’ve taken – or maybe teach you something new! Visit my blog to find out!

REDCap Mess: How it Got There, and How to Clean it Up

REDCap mess on your hands? The REDCap designers made the application so loosey goosey, you can really program yourself into a messy corner if you don't plan well.

REDCap mess happens often in research shops, and it’s an analysis showstopper! Read my blog post to learn my secret tricks for breaking through the barriers and getting on with data analytics!

Time Series Plots in R Using ggplot2 Are Ultimately Customizable

Time series plots can be customized if you use package ggplot2 in R. You can place labels and configure axes.

Time series plots in R are totally customizable using the ggplot2 package, and can come out with a look that is clean and sharp. However, you usually end up fighting with formatting the x-axis and other options, and I explain in my blog post.

Data Curation Solution to Confusing Options in R Package UpSetR

It is possible to use data curation to solve the problem of a confusion vector containing options.

Data curation solution that I posted recently with my blog post showing how to do upset plots in R using the UpSetR package was itself kind of a masterpiece. Therefore, I thought I’d dedicate this blog post to explaining how and why I did it.

Convert CSV to RDS When Using R for Easier Data Handling

If you want to use R for a project and the source CSV is very big, it can improve input/output efficiency to convert the file to an RDS.

Convert CSV to RDS is what you want to do if you are working with big data files in R GUI and want to improve efficiency. Read my blog post for an explanation and video demonstrations of this process!

GPower Case Example Shows How to Calculate and Document Sample Size

This case example shows a use case where we estimated sample size in GPower under different conditions.

GPower case example shows a use-case where we needed to select an outcome measure for our study, then do a power calculation for sample size required under different outcome effect size scenarios. My blog post shows what I did, and how I documented/curated the results.

Querying the GHDx Database: Demonstration and Review of Application

Many data scientists interested in health are looking to query the Global Burden of Disease database, also known as the GHDx

Querying the GHDx database is challenging because of its difficult user interface, but mastering it will allow you to access country-level health data for comparisons! See my demonstration!

Counting Rows in SAS and R Use Totally Different Strategies

If you are a data scientist working with large datasets, you need to learn the commands to count both columns and rows in the dataset, whether you are using SAS or R.

Counting rows in SAS and R is approached differently, because the two programs process data in different ways. Read my blog post where I describe both ways.

Data Science YouTube Channel Planned Expansion for 2022 – Please Subscribe!

In 2022 I am going to be putting new content on my YouTube channel focused on teaching data science, and providing educational resources.

Data science YouTube channel that brings you educational resources, career advice, live interactive sessions, and keeps you up-to-date in innovation and analytics – that’s what I have planned for 2022! Read my blog post for details.

Native Formats in SAS and R for Data Are Different: Here’s How!

Why use particular data formats for different programming languages in statistics? Because the programs can then process the data faster and with more accuracy.

Native formats in SAS and R of data objects have different qualities – and there are reasons behind these differences. Learn about them in this blog post!

Verified by MonsterInsights