Tag Archives: data science

“Bad Blood” Reveals Theranos was Guilty of Bad Business and Bad Data Science: Part 1 of 5

Businesses that are chaotic and poorly run do not steward their data properly, and it is inaccurate.

This is my first blog post in a series of five where I talk about data-related misconduct outlined in the book “Bad Blood”, and provide guidance on how to prevent it.

Data Science of Data Collection: Free Course and Course Series!

Take this free online course in the data science of data collection to further your career

Learn the “data science of data collection” through my free introductory course! If you want to learn more, continue with the whole six-course series. Great for graduate students and QA/QI professionals!

Applying Rothman’s Causal Pie Model to the Death of George Floyd

Weighing relative causes visually is easier with Rothman's causal pie model

In the murder trial of Officer Derek Chauvin, the prosecution must demonstrate that the police officer’s knee on George Floyd’s neck constituted a “substantial” cause of Mr. Floyd’s death “beyond a reasonable doubt”. This presents a challenge in weighing relative causes of death, and this leads us essentially to causal inference. My blog post demonstrates […]

Two Takeaways from Danny Ma’s Machine Learning Panel: Understanding the Problem, and Understanding your Data

Roller coaster like an ETL pipeline that does automation

This lively panel discussed many topics around designing and implementing machine learning pipelines. Two main issues were identified. The first is that you really have to take some time to do exploratory research and define the problem. The second is that you need to also understand the business rules and context behind the data.

Data Scientists Interested in Encryption Should Take this Online Cryptography Course

Cartoon of person programming with code in the background

Even if you do not deal directly with cryptography, the need to maintain data privacy often leads data scientists to need to study cryptography. This basic online course is part of an ethical hacking certification and gives a basic overview of issues with data transfer and cryptography.

If You Want to Increase Conversions, Try my A/B Testing Course on LinkedIn Learning

Learn about data science from doing real world projects online or in a laboratory

A/B testing seems straightforward, but there are a lot of picky details. What A and B conditions do you actually test? How long do you run the test? How do you calculate the statistics for the test? Answer your questions by taking this LinkedIn Learning course.

Announcing the Publication of my New SAS Book on Data Warehousing

Learn how to do data warehousing in SAS. You can purchase this book and use the code in it to help you.

SAS is known for big data and data warehousing, but how do you actually design and build a SAS data warehouse or data lake? What datasets do you include? How do you transform them? How do you serve warehouse users? How do you manage your developers? This book has your answers!

Announcing my New(-ish) Data Curation Course on LinkedIn Learning!

Data curation is an important skill to learn if you want to manage a data science team.

Curation files are especially helpful for communicating about data on teams. Learn more about what you’ll learn when you take my online LinkedIn Learning data curation course!

Confused when Downloading BRFSS Data? Here is a Guide

You can download public data from health surveillance surveys. However, you have to know how to locate it on the web site.

I use the datasets from the Behavioral Risk Factor Surveillance Survey (BRFSS) to demonstrate in a lot of my data science tutorials. The BRFSS are free and available to the public – but they are kind of buried on the web site. This blog post serves as a “map” to help you find them!

Doing Surveys? Try my R Likert Plot Data Hack!

The Likert package in R can visualize categorical data.

I love the Likert package in R, and use it often to visualize data. The problem is that sometimes, I have sparse data, and this can cause problems with the package. This blog post shows you a workaround, and also, a way to format the final plot that I think looks really great!

Verified by MonsterInsights