Category Archives: Data Science

Posts about data science topics.

Descriptive Analysis of Black Friday Death Count Database: Creative Classification

The Black Friday Death Count database has a list of news reports of deaths or injuries on Black Friday.

Descriptive analysis of Black Friday Death Count Database provides an example of how creative classification can make a quick and easy data science portfolio project!

Classification Crosswalks: Strategies in Data Transformation

What if you have too many categories in a categorical variable? Your cardinality is too high for a chi-square analysis.

Classification crosswalks are easy to make, and can help you reduce cardinality in categorical variables, making for insightful data science portfolio projects with only descriptive statistics. Read my blog post for guidance!

FAERS Data: Getting Creative with an Adverse Event Surveillance Dashboard

Want to learn more about pharmacy data? You can use adverse event data in a data science portfolio project.

FAERS data are like any post-market surveillance pharmacy data – notoriously messy. But if you apply strong study design skills and a scientific approach, you can use the FAERS online dashboard to obtain a dataset and develop an enlightening portfolio project. I show you how in my blog post!

Dataset Source Documentation: Necessary for Data Science Projects with Multiple Data Sources

If you work on a big data project with multiple source datasets, you run the risk of forgetting exactly how you blended them together.

Dataset source documentation is good to keep when you are doing an analysis with data from multiple datasets. Read my blog to learn how easy it is to throw together some quick dataset source documentation in PowerPoint so that you don’t forget what you did.

Joins in Base R: Alternative to SQL-like dplyr

In base R, you can execute SQL-like joins, as long as you use the correct code syntax.

Joins in base R must be executed properly or you will lose data. Read my tutorial on how to correctly execute left joins in base R.

NHANES Data: Pitfalls, Pranks, Possibilities, and Practical Advice

If you are interested in population-level surveillance data, you might have thought about using NHANES data in portfolio projects.

NHANES data piqued your interest? It’s not all sunshine and roses. Read my blog post to see the pitfalls of NHANES data, and get practical advice about using them in a project.

Color in Visualizations: Using it to its Full Communicative Advantage

When using big data, you will want to make visualizations. How do you use color to the greatest communicative advantage?

Color in visualizations of data curation and other data science documentation can be used to enhance communication – I show you how!

Defaults in PowerPoint: Setting Them Up for Data Visualizations

The defaults in PowerPoint are really set up for making presentations, not data visualizations.

Defaults in PowerPoint are set up for slides – not data visualizations. Read my blog post for tips on reconfiguring PowerPoint to make it easy for dataviz!

Text and Arrows in Dataviz Can Greatly Improve Understanding

Adding text and arrows to diagrams can help your audience navigate the image, and understand what you are trying to communicate.

Text and arrows in dataviz, if used wisely, can help your audience understand something very abstract, like a data pipeline. Read my blog post for tips in choosing images for your data visualizations!

Shapes and Images in Dataviz: Making Choices for Optimal Communication

If you use good judgment in choosing chapes and images to add to your data visualizations, your audience will be enlightened.

Shapes and images in dataviz, if chosen wisely, can greatly enhance the communicative value of the visualization. Read my blog post for tips in selecting shapes for data visualizations!

Verified by MonsterInsights