Category Archives: Data Science

Posts about data science topics.

CDC Wonder for Studying Vaccine Adverse Events: The Shameful State of US Open Government Data

The open government data movements in many countries have resulted in government data being available online.

CDC Wonder is an online query portal that serves as a gateway to many government datasets. Although antiquated, it still works for extracting data, and my blog post shows you how.

AI Careers: Riding the Bubble

If you are a data scientist, you may want to do statistics, but you may also be interested in machine learning and artificial intelligence.

AI careers are not easy to navigate. Read my blog post for foolproof advice for those interested in building a career in AI.

Descriptive Analysis of Black Friday Death Count Database: Creative Classification

The Black Friday Death Count database has a list of news reports of deaths or injuries on Black Friday.

Descriptive analysis of Black Friday Death Count Database provides an example of how creative classification can make a quick and easy data science portfolio project!

Classification Crosswalks: Strategies in Data Transformation

What if you have too many categories in a categorical variable? Your cardinality is too high for a chi-square analysis.

Classification crosswalks are easy to make, and can help you reduce cardinality in categorical variables, making for insightful data science portfolio projects with only descriptive statistics. Read my blog post for guidance!

FAERS Data: Getting Creative with an Adverse Event Surveillance Dashboard

Want to learn more about pharmacy data? You can use adverse event data in a data science portfolio project.

FAERS data are like any post-market surveillance pharmacy data – notoriously messy. But if you apply strong study design skills and a scientific approach, you can use the FAERS online dashboard to obtain a dataset and develop an enlightening portfolio project. I show you how in my blog post!

Dataset Source Documentation: Necessary for Data Science Projects with Multiple Data Sources

If you work on a big data project with multiple source datasets, you run the risk of forgetting exactly how you blended them together.

Dataset source documentation is good to keep when you are doing an analysis with data from multiple datasets. Read my blog to learn how easy it is to throw together some quick dataset source documentation in PowerPoint so that you don’t forget what you did.

Joins in Base R: Alternative to SQL-like dplyr

In base R, you can execute SQL-like joins, as long as you use the correct code syntax.

Joins in base R must be executed properly or you will lose data. Read my tutorial on how to correctly execute left joins in base R.

NHANES Data: Pitfalls, Pranks, Possibilities, and Practical Advice

If you are interested in population-level surveillance data, you might have thought about using NHANES data in portfolio projects.

NHANES data piqued your interest? It’s not all sunshine and roses. Read my blog post to see the pitfalls of NHANES data, and get practical advice about using them in a project.

Color in Visualizations: Using it to its Full Communicative Advantage

When using big data, you will want to make visualizations. How do you use color to the greatest communicative advantage?

Color in visualizations of data curation and other data science documentation can be used to enhance communication – I show you how!

Defaults in PowerPoint: Setting Them Up for Data Visualizations

The defaults in PowerPoint are really set up for making presentations, not data visualizations.

Defaults in PowerPoint are set up for slides – not data visualizations. Read my blog post for tips on reconfiguring PowerPoint to make it easy for dataviz!

Verified by MonsterInsights