Category Archives: Data Science

Posts about data science topics.

Benchmarking Runtime is Different in SAS Compared to Other Programs

How do you measure how long it takes for code to run in different programs? And why would you want to measure something like that? Mainly, the reason to benchmark runtime is so that you can figure out how to optimize your code.

Benchmarking runtime is different in SAS compared to other programs, where you have to request the system time before and after the code you want to time and use variables to do subtraction, as I demonstrate in this blog post.

End-to-End AI Pipelines: Can Academics Be Taught How to Do Them?

What is an end-to-end AI pipeline? And why are academics so bad at making one? These are different ideas we will examine in this blog post.

End-to-end AI pipelines are being created routinely in industry, and one complaint is that academics can only contribute to one component of the pipeline. Really? Read my blog post for an alternative viewpoint!

Referring to Columns in R by Name Rather than Number has Pros and Cons

There are different ways to refer to variables in R dataframes. You can use a field names, and you can also use field numbers.

Referring to columns in R can be done using both number and field name syntax. Although field name syntax is easier to use in programming, my blog demonstrates how you can use column numbers to make automation easier.

The Paste Command in R is Great for Labels on Plots and Reports

The paste command is used to concatenate strings in R. You can use it different ways, which is what I demonstrate in my blog and videos.

The paste command in R is used to concatenate strings. You can leverage the paste command to make refreshable label objects for reports and plots, as I describe in my blog post.

Coloring Plots in R using Hexadecimal Codes Makes Them Fabulous!

You do not need to use the default R colors on your plot. You don't even need to limit yourself to named colors on cheat sheets.

Recoloring plots in R? Want to learn how to use an image to inspire R color palettes you can use in ggplot2 plots? Read my blog post to learn how.

Adding Error Bars to ggplot2 Plots Can be Made Easy Through Dataframe Structure

Error bars on plots can provide the audience an estimate of the amount of certainty you have with your estimates.

Adding error bars to ggplot2 in R plots is easiest if you include the width of the error bar as a variable in your plot data. Read my blog post to see an example.

AI on the Edge: What it is, and Data Storage Challenges it Poses

AI on the edge refers to doing the AI processing and equations at the site of the object collecting the data.

“AI on the edge” was a new term for me that I learned from Marc Staimer, founder of Dragon Slayer Consulting, who was interviewed in a podcast. Marc explained how AI on the edge poses a data storage problem, and my blog post proposes a solution!

Pie Chart ggplot Style is Surprisingly Hard! Here’s How I Did it

How do you make a pie chart in ggplot2 package in R? It's not that obvious

Pie chart ggplot style is surprisingly hard to make, mainly because ggplot2 did not give us a circle shape to deal with. But I explain how to get around it in my blog pot.

Time Series Plots in R Using ggplot2 Are Ultimately Customizable

Time series plots can be customized if you use package ggplot2 in R. You can place labels and configure axes.

Time series plots in R are totally customizable using the ggplot2 package, and can come out with a look that is clean and sharp. However, you usually end up fighting with formatting the x-axis and other options, and I explain in my blog post.

Data Curation Solution to Confusing Options in R Package UpSetR

It is possible to use data curation to solve the problem of a confusion vector containing options.

Data curation solution that I posted recently with my blog post showing how to do upset plots in R using the UpSetR package was itself kind of a masterpiece. Therefore, I thought I’d dedicate this blog post to explaining how and why I did it.