Category Archives: Data Science

Posts about data science topics.

Table Editing in R is Easy! Here Are a Few Tricks…

When you use a data analysis program like R or SAS, you often have to do some data editing. It can be difficult because the software was intended for calculations, not transformation.

Table editing in R is easier than in SAS, because you can refer to columns, rows, and individual cells in the same way you do in MS Excel. Read my blog post for example R table editing code.

R for Logistic Regression: Example from Epidemiology and Biostatistics

Logistic regression calculate the log odds of the probability of the outcome. Many people are used to using SAS for logistic regression, but you can also use R.

R for logistic regression in health data analytics is a reasonable choice, if you know what packages to use. You don’t have to use SAS! My blog post provides you example R code and a tutorial!

Connecting SAS to Other Applications: Different Strategies

Did you know it is possible to integrate SAS with other data environments, like Microsoft SQL or Excel?

Connecting SAS to other applications is often necessary, and there are many ways to do it. Read this blog post for a couple of use-cases of SAS data integration using various SAS components.

Portfolio Project Examples for Independent Data Science Projects

Are you a data scientist who is interested in doing independent portfolio projects to sharpen your skills? Then I strongly suggest you get a coach or a mentor.

Portfolio project examples are sometimes needed for newbies in data science who are looking to complete independent projects. This blog post provides some great examples of independent projects you can do with datasets available online!

Project Management Terminology for Public Health Data Scientists

If you are a health data analyst or a biostatistician, we might find computer programmers and application developers use different terminology for the same ideas and concepts.

Project management terminology is often used around epidemiologists, biostatisticians, and health data scientists, and it’s often hard for us to admit we aren’t familiar with some of the terms. Watch my videos and take my Applications Basics course to get up to speed with vocabulary from the health application development domain.

Rapid Application Development Public Health Style

If you work on front-ends or back-ends of health applications, you are probably already familiar with the concepts of Agile and rapid application development.

“Rapid application development” (RAD) refers to an approach to designing and developing computer applications. In public health and healthcare, we are not taught about application development – but it’s good for us to learn about it, since we have to deal with data from health applications. My blog post talks about the RAD approach I […]

Understanding Legacy Data in a Relational World

Data systems started being in use in the 1960s and 1970s, but these were flat systems, usually using IBM mainframes.

Understanding legacy data is necessary if you want to analyze datasets that are extracted from old systems. This knowledge is still relevant, as we still use these old systems today, as I discuss in my blog post.

Front-end Decisions Impact Back-end Data (and Your Data Science Experience!)

How the front-end and back-end are connected can impact how data are stored in the application. So if you extract the data, you can have data quality problems caused by the front-end.

Front-end decisions are made when applications are designed. They are even made when you design a survey in SurveyMonkey. What health data analysts often don’t realize is that these decisions have a profound impact on the quality and accuracy of the data that are collected through these front-ends, which is the focus of this blog […]

Reducing Query Cost (and Making Better Use of Your Time)

Slow queries can happen in SAS, R, Python, SQL or any database language. These slow queries have a cost.

Reducing query cost is especially important in SAS – but do you know how to do it, or what it even means? Read my blog post to learn why this is important in health data analytics.

Curated Datasets: Great for Data Science Portfolio Projects!

If you need data to do a project, read this blog post for information.

Curated datasets are useful to know about if you want to do a data science portfolio project on your own. I made this blog post for our group mentoring program. Check out the ones I am promoting on my blog!

Verified by MonsterInsights