Front-end decisions made at the time of application design often impact the structure and quality of our back-end data. Let’s say we extract data from a medical record application like Epic and try to analyze it – and we see that the data are screwed up. Those of us in health data analytics are often at a loss to troubleshoot this problem. This is because many of us don’t even know what a “front-end” or a “back-end” is! So we receive some screwed up data, and we can’t figure out why it is so screwed up.
We don’t realize how much the front-end – which is the “window” the data pass through to enter the back-end when someone does data entry – may have impacted what data show up in the back-end, in terms of why certain columns are there, and why they might have the values they have. If you are unfamiliar with the terms “front-end” and “back-end”, watch my video below for an example.
All Application Designers Must Make Front-end Decisions
Here’s something popular we use in health data analytics that we might not think of as being an application with a front-end and a back-end – SurveyMonkey! Actually, all survey applications – including REDCap – are technically applications with a front-end and a back-end. Since a survey is basically mainly a front-end, applications like SurveyMonkey are optimized so that you can make excellent front-end decisions for survey administration.
SurveyMonkey and other survey software applications allow you to make important front-end decisions that make your survey data look awesome. First, they allow you to build in validation rules for free text variables. Next, they allow you to create low cardinality picklists. Picklists can restrict answers to one answer only (e.g., “What is your favorite color? Select one.” or multiples (e.g., “What colors are you willing to wear? Check all that apply.”).
Third, survey software applications allow you to build in skip patterns, so you can have subsets of respondents opt out of certain items. In addition, survey software applications allow you to make different decisions about how you present Likert scale or other ordinal items – in a matrix, or using fancy controls like dials. Watch my livestream where I show you how to make a survey specification before programming a SurveyMonkey survey so you can see how I document the decisions I make when designing my SurveyMonkey front-end.
Even though epidemiologists and biostatisticians regularly use survey software applications, they often have trouble using them, because they don’t really understand how they are built and how they work. They might observe that their survey data are being recorded in some sort of screwed up way, but they lack the expertise and background in application design and development to easily troubleshoot.
Document Your Front-end Decisions with Data Curation
As you can see if you watch my video above or read my blog post about REDCap, I take my front-ends (including all my surveys!) very seriously. I do a lot of data curation in the design of my surveys. I’m very careful when I implement my surveys in the software, because it’s very easy to create a hot mess in the back-end if you do something wrong in the front-end!
Data curation is the key to managing data science teams, which is why it’s a central topic in my online boot camp course for research data managers, “How to do Data Close-out”. I feel that if you can master data curation, you can communicate clearly with others about data, and that is the key to being a successful manager in the data science field.
Updated June 10, 2023.
Front-end decisions are made when applications are designed. They are even made when you design a survey in SurveyMonkey. What health data analysts often don’t realize is that these decisions have a profound impact on the quality and accuracy of the data that are collected through these front-ends, which is the focus of this blog post.