GitHub beginners – even in data science – often feel intimidated when starting their GitHub accounts and trying to interact with the web page. Don’t be shy! Catch the highlights from a recent GitHub beginners workshop I held!
Category Archives: Data Science
Posts about data science topics.
ETL pipeline documentation is great for team communication as well as data stewardship! Read my blog post to learn my tips and tricks.
Benchmarking runtime is different in SAS compared to other programs, where you have to request the system time before and after the code you want to time and use variables to do subtraction, as I demonstrate in this blog post.
End-to-end AI pipelines are being created routinely in industry, and one complaint is that academics can only contribute to one component of the pipeline. Really? Read my blog post for an alternative viewpoint!
Referring to columns in R can be done using both number and field name syntax. Although field name syntax is easier to use in programming, my blog demonstrates how you can use column numbers to make automation easier.
The paste command in R is used to concatenate strings. You can leverage the paste command to make refreshable label objects for reports and plots, as I describe in my blog post.
Recoloring plots in R? Want to learn how to use an image to inspire R color palettes you can use in ggplot2 plots? Read my blog post to learn how.
Adding error bars to ggplot2 in R plots is easiest if you include the width of the error bar as a variable in your plot data. Read my blog post to see an example.
“AI on the edge” was a new term for me that I learned from Marc Staimer, founder of Dragon Slayer Consulting, who was interviewed in a podcast. Marc explained how AI on the edge poses a data storage problem, and my blog post proposes a solution!
Pie chart ggplot style is surprisingly hard to make, mainly because ggplot2 did not give us a circle shape to deal with. But I explain how to get around it in my blog pot.