The data science newbie often approaches, asking me my advice for getting into healthcare data science. First, I warn them that there is a ton of subject matter expertise (SME) necessary to make sense of the data. If this doesn’t scare them off, then I warn them that they would probably have to learn epidemiology if they wanted to be any good.
If they are still with me, then I send them through my healthcare data science newbie do-it-yourself “starter kit” – and it’s very do-it-yourself! I thought I’d save myself the trouble of repeating myself over and over by just making a blog post about it, and telling everyone what I tell whoever contacts me.
Here is my recommended course of action.
If You Are a Data Science Newbie and Also a Healthcare Newbie, Learn About Healthcare First
I’ve observed that there are roughly two kinds of people interested in a career in data science:
- “Younger” people who have never had a career in anything, and are looking for a humanitarian role in the world that involves data science, and
- “Older” people who have already had a career in something else (that may have to do with healthcare), and want to move into a career focused on data.
If you are the second kind of data science newbie – you have already worked in healthcare – then you do not really need to learn about healthcare. But if you’ve never been in the environment of healthcare, you really need to learn the setting and language.
Of course, the best way to learn about healthcare is to get a job in healthcare. But the second best thing is to study it, which you can do with my series of lectures on the US healthcare system that are free on YouTube.
Next, Learn About Healthcare Data
Healthcare data are a reflection of the business processes in healthcare, which is why I made you watch those lectures first. Once you familiarize yourself with the business of healthcare, then you will realize that people miscommunicate in healthcare all the time. This is because different fields within healthcare speak different languages and do different business processes.
In order to do healthcare analytics, you need to get everyone – the data people and the subject matter experts – on the same page. The way to do that is with data curation skills. Data curation is a way of documenting data so that everyone understands it. So once you get done with the lectures above, you should take my LinkedIn Learning course in data curation.
Along with data curation, you need to learn about data abstraction. A lot of people have this fantasy that because so much of our healthcare data are digitized, we can take digital versions of the data and do healthcare analytics with it. In reality, we often have to manually abstract data out of electronic records and reenter using data entry methods.
In addition to knowing how to do data abstraction, you just need to better understand what it’s like to collect data in a healthcare setting. I have a lot of customers who are doing quality assurance/quality improvement (QA/QI) in healthcare. They all find themselves making “minimal datasets” and documenting them, and spending a lot of time setting up and leading data collection efforts.
For that reason, I made a free, online overview course that explains data collection in healthcare called, “Understanding Research Forms, Surveys, and Instruments”. So that’s your next homework – take that free course.
Now That You Are Comfortable with the Data, Learn Study Design
Moving on along your data science journey, next, you need to learn a research design system. I’m an epidemiologist, so of course, I prefer epidemiology. The other main choice involves social science approaches like the ones used in psychology and education. The main reason I do not like those is that they were not invented for causal inference around medicine, and epidemiology was invented expressly for this. Since that is what we are doing in healthcare data science, I stick to the study design tools in epidemiology.
It’s hard to find study design courses for data scientists, so I made a two-course series on LinkedIn Learning, “Designing Big Data Healthcare Studies”. So that’s the next task on your adventure – take those courses.
Next, Learn Basic Statistics
The typical data science newbie jumps head first into learning R, SAS, or Python – which means that they are probably learning statistics first. As you can see, that is jumping ahead, in my opinion. It’s better to have a strong grasp of “small data statistics” before we move onto the operations on big data we do in SAS and Python. Especially with healthcare data, we need to have a strong feel for the meaning of the data before we start trying to do something grand like make an artificial intelligence algorithm.
I used to teach a course in undergraduate statistics at Laboure College in Milton, Massachusetts, and I recorded my lectures online. Those generous souls at Free Code Camp assembled them with my permission into one, huge, long 8-hour lecture, but that makes it hard to learn.
Instead, I encourage you to watch my playlist (or click on each video you want below), because I recently updated each description with timestamps, so it is very easy for you to move around in the video and rewatch anything you need!
If you Make it Here, You are not a Newbie Anymore!
Congratulations! If you make it here, you are no longer a newbie – you are advanced! Now, it is just a process of putting together what you know, and doing it with bigger datasets.
If you want to go into healthcare analytics, I hate to say it, but you absolutely have to learn SAS. SAS is a classic program that we have been using in healthcare since the 1960s, and it’s not going anywhere. In fact, people are currently migrating legacy SAS online to use the new cloud-based SAS Viya.
So if you make it here, the first thing you should do is take my two-course series on LinkedIn Learning in SAS. The first course teaches descriptive analysis, and the second teaches regression. Also you will want to get started with SAS OnDemand for Academics – SAS’s free online version that allows you to practice the SAS you learn in my LinkedIn Learning courses. You should start by taking my free online course called “Getting Started with SAS ODA“, and if you want more practice, try taking my series of 23 courses (under development) that include challenges in SAS you can practice using SAS ODA!
If you want to be a manager in healthcare analytics (or, at least sound like one at a job interview), I strongly encourage you to also read my new book, “Mastering SAS Programming for Data Warehousing”. It includes all my tips and tricks for running a healthcare data warehouse or data lake and trying to keep all your stakeholders happy while you safeguard a boatload of super sensitive data.
Anything Else the Data Science Newbie Should Know?
The SAS courses above show you how to put together what you know from data, statistics, and study design to ask a research question, formulate an analytic plan, then conduct the analysis and answer the research question. I have a companion series of courses where you do the same thing, only in R.
Currently, we are seeing a greater adoption of R and Python in healthcare data science. This means that while SAS is “required knowledge” in healthcare analytics, R and Python are also slowly attaining that status. SAS and R integration shops are emerging.
Given this, if you feel like practicing your healthcare analytics some more, I encourage you to also take the companion R courses on LinkedIn Learning. You will reinforce what you already learned about study design, data, and statistics, and can just concentrate on practicing with the software. I don’t know Python, but I know R, and SAS is a lot harder than R. So if you learn SAS first, R will be a breeze.
Finally, if you take my study design courses on LinkedIn Learning, you’ll find that I do not cover “experimental design”, because that’s basically a clinical trial. You cannot do a clinical trial online.
But actually, you can – you just wouldn’t do a “drug” clinical trial online. When you do A/B testing for improving conversion rates – that’s essentially doing an online clinical trial. When my customers started asking me to teach them that, I responded by making my rather popular LinkedIn Learning course, “Data Science of Experimental Design”. I’m not sure you’d ever use this in healthcare, but people like it, so maybe you will, too. I’ll link you to it below.
Hopefully, this is enough to get you started on your own. Also, don’t forget to:
- Follow my Twitter feed where I tweet out data science resources;
- Follow my YouTube channel where I regularly post tutorial videos;
- Connect with me on LinkedIn so that my data science posts and articles show up in your feed;
- Sign up for my online data science newsletter mailing list; and
- Follow my blog – look in the upper right for the “subscribe” button. You can enter your e-mail address, and then get an e-mail whenever I add a new blog post!
If you have questions, feel free to e-mail me, or connect with me and message me on LinkedIn.
Updated June 21, 2021. Added more information about SAS ODA and social media links on December 7, 2021. Added video December 23, 2021. Added video block Mach 30, 2022. Added mentoring program banner September 26, 2022. Revised banners June 25, 2023. Updated banners July 2, 2024.
Monika posts her “data science newbie do-it-yourself starter kit”, with links to cheap or free learning resources for the data science newbie who wants to get started in healthcare analytics.