Category Archives: Data Science

Posts about data science topics.

Apply Weights? It’s Easy in R with the Survey Package!

You can do a population-based analysis if the original dataset used multi-stage sampling.

Apply weights to get weighted proportions and counts! Read my blog post to learn how to use the survey package in R.

Make Categorical Variable Out of Continuous Variable

You can make mean, median and mode with a continuous variable.

Make categorical variables by cutting up continuous ones. But where to put the boundaries? Get advice on my blog!

Remove Rows in R with the Subset Command

Learn r programming and biostatistics

Remove rows by criteria is a common ETL operation – and my blog post shows you how to do it using the subset command.

CDC Wonder for Studying Vaccine Adverse Events: The Shameful State of US Open Government Data

The open government data movements in many countries have resulted in government data being available online.

CDC Wonder is an online query portal that serves as a gateway to many government datasets. Although antiquated, it still works for extracting data, and my blog post shows you how.

AI Careers: Riding the Bubble

If you are a data scientist, you may want to do statistics, but you may also be interested in machine learning and artificial intelligence.

AI careers are not easy to navigate. Read my blog post for foolproof advice for those interested in building a career in AI.

Descriptive Analysis of Black Friday Death Count Database: Creative Classification

The Black Friday Death Count database has a list of news reports of deaths or injuries on Black Friday.

Descriptive analysis of Black Friday Death Count Database provides an example of how creative classification can make a quick and easy data science portfolio project!

Classification Crosswalks: Strategies in Data Transformation

What if you have too many categories in a categorical variable? Your cardinality is too high for a chi-square analysis.

Classification crosswalks are easy to make, and can help you reduce cardinality in categorical variables, making for insightful data science portfolio projects with only descriptive statistics. Read my blog post for guidance!

FAERS Data: Getting Creative with an Adverse Event Surveillance Dashboard

Want to learn more about pharmacy data? You can use adverse event data in a data science portfolio project.

FAERS data are like any post-market surveillance pharmacy data – notoriously messy. But if you apply strong study design skills and a scientific approach, you can use the FAERS online dashboard to obtain a dataset and develop an enlightening portfolio project. I show you how in my blog post!

Dataset Source Documentation: Necessary for Data Science Projects with Multiple Data Sources

If you work on a big data project with multiple source datasets, you run the risk of forgetting exactly how you blended them together.

Dataset source documentation is good to keep when you are doing an analysis with data from multiple datasets. Read my blog to learn how easy it is to throw together some quick dataset source documentation in PowerPoint so that you don’t forget what you did.

Joins in Base R: Alternative to SQL-like dplyr

In base R, you can execute SQL-like joins, as long as you use the correct code syntax.

Joins in base R must be executed properly or you will lose data. Read my tutorial on how to correctly execute left joins in base R.

Verified by MonsterInsights