Connecting SAS to Other Applications: Different Strategies

Did you know it is possible to integrate SAS with other data environments, like Microsoft SQL or Excel?

Connecting SAS to other applications has had different implications over time. In my book, “Mastering SAS Programming for Data Warehousing”, I describe different ways of connecting SAS to other applications that I have been doing since the early 2000s. Regardless of whether you are using PC SAS or a SAS server, you can typically use the SAS Access component to open up an ODBC connection to another data environment (e.g., MS SQL or Excel).

But connecting SAS components together is kind of like making functional things with Legos, in that you have to put them together to make something that runs and accomplishes data tasks. I’ve found it helpful for SAS users (and non-SAS users, really) to explain not only how you can go about connecting SAS components together, but how you can actually develop a pipeline that works and does something you want with data.

Use Case #1: SAS Enterprise Guide (EG) for Connecting SAS to Other Environments

SAS Enterprise Guide (EG) is a separate application from SAS – and by that, I mean SAS EG versions separately from SAS itself. SAS EG was invented in the 1990s, and so it has kind of an XP look. I made a diagram of the interface that I included in my book that I’ll show you below.

SAS Enterprise Guide allows you to configure pipelines in SAS. You can arrange the windows as you like.

SAS EG Interface

I emphasize that the above is a diagram I drew – not a real screen shot. There are two reasons for this. First, I don’t want to have a copyright violation by taking a screen shot from someone else, because I do not have access to an instance of SAS EG.

But second – and perhaps, more pragmatically – it is very hard to arrange a SAS EG window and make it look good (kind of like trying to arrange an ERD to look good in SQL). As you can see in the diagram, I am showing an example of five windows opened and arranged: Process Flow, Project Tree, Tasks, Results, and Program. If you were really logged into an instance of SAS EG, you’d realize there are many more windows you could open and close and arrange yourself.

SAS EG for Connecting SAS to Other Environments

What you will realize if you actually use SAS EG is that it is actually a way of creating pipelines in SAS to make things more automated. In other words, you first have to have all your SAS code together and a basic plan of the steps you need to carry out in SAS. Then, you can automate it in EG.

Let’s imagine a scenario. Let’s say that you work on an analytics team that studies sales at your company, and you run new models each week. You have a main sales database in MS SQL, but you are an analyst, so you work in the company’s SAS environment. One of the things you do in SAS is make projections of future sales based on past sales – but in order to do that, you need regular weekly data reports from the live MS SQL database. Here is an example of a pipeline you might do manually that you could automate in EG.

You need to map out your data pipeline before you configure it in SAS Enterprise Guide. You will need to drag the objects out and adjust their properties.

Before we walk through this graphic, you have to imagine that you are sitting at your SAS server using SAS EG – which is connected to your SAS data stores already. Now we start the pipeline at the upper left. If you were doing things manually, your first step would be to fire up SAS Access and connect into the MS SQL sales server.

This presumes that you have already worked with the MS SQL server administrators, and together, you have built a SQL view to which your account has access. Then, when you use SAS Access to connect, you hit the view. Let’s say it is parameterized, so you can run the view for a certain time period (e.g., weekly). You need to pass it the parameter – which is easy to do manually if running the MS SQL view, but something you would have to automate if you doing it in EG.

Once you have the view in MS SQL, then in SAS, you’d use data step language or PROC SQL to save the view as a SAS dataset in your SAS environment. I recommend figuring out how to do all this manually first – and even making a diagram like this – before trying to build it in EG. My video below goes through a SAS white paper on the topic.

Use Case #2: Connecting SAS with Data Integration (DI) Studio

SAS Data Integration Studio, or DI Studio for short, is different than EG in that it is mainly for data integration. EG can do a lot of different things, because it basically builds pipelines. By contrast, SAS DI Studio is focused on data integration only, so it is optimized for connecting data stores to each other.

But as usual, SAS is very “SAS-centric” in that it always assumes you are in a SAS environment, and if there are data outside that environment that you need, your goal is to move it into that environment. The two main reasons why SAS typically assumes this with their products is that for one, I/O is very costly when moving data from outside the SAS environment into it. Secondly, SAS is an analytics platform – so theoretically, if you are using SAS, all of your analytics should be taking place within the SAS environment, so you need to start by putting all your data there. You are probably taking copies of data from other systems and moving them into your SAS environment for analysis, so when seen this way, it makes sense that SAS is so SAS-centric.

That is my preamble to the following diagram, where I demonstrate a use-case for SAS DI Studio at a research institute.

If you are using SAS because you are doing research, you probably have an analytics environment. Therefore, you want to move your data into that environment before analyzing it.

In the diagram, the biostatistician is in the SAS environment – along with the live research data on the SAS server, and the SAS DI Studio component. Like EG, DI Studio can set up pipelines – but again, you have to design things manually first.

The diagram imagines the statistician is trying to audit data in a research study. Specifically, the statistician is trying to figure out if all the enrolled participants have all their labs in (from the external SQL lab database in green), and have completed all their follow-up surveys (from the pink Excel dataset). So the statistician could set up pipelines in DI Studio to use SAS Access to go out and connect to the lab data in SQL and the survey data in Excel, pull them into the SAS environment, and compare them with the data in the SAS server.

The video below covers a SAS white paper that goes into more detail about using SAS DI Studio, including how to do bulk loading.

Updated July 29, 2023. Added enterprise guide video September 6, 2023.

Read all of our data science blog posts!

Make Categorical Variable Out of Continuous Variable

Make categorical variables by cutting up continuous ones. But where to put the boundaries? Get [...]

Remove Rows in R with the Subset Command

Remove rows by criteria is a common ETL operation – and my blog post shows [...]

CDC Wonder for Studying Vaccine Adverse Events: The Shameful State of US Open Government Data

CDC Wonder is an online query portal that serves as a gateway to many government [...]

AI Careers: Riding the Bubble

AI careers are not easy to navigate. Read my blog post for foolproof advice for [...]

Descriptive Analysis of Black Friday Death Count Database: Creative Classification

Descriptive analysis of Black Friday Death Count Database provides an example of how creative classification [...]

Classification Crosswalks: Strategies in Data Transformation

Classification crosswalks are easy to make, and can help you reduce cardinality in categorical variables, [...]

FAERS Data: Getting Creative with an Adverse Event Surveillance Dashboard

FAERS data are like any post-market surveillance pharmacy data – notoriously messy. But if you [...]

Dataset Source Documentation: Necessary for Data Science Projects with Multiple Data Sources

Dataset source documentation is good to keep when you are doing an analysis with data [...]

Joins in Base R: Alternative to SQL-like dplyr

Joins in base R must be executed properly or you will lose data. Read my [...]

NHANES Data: Pitfalls, Pranks, Possibilities, and Practical Advice

NHANES data piqued your interest? It’s not all sunshine and roses. Read my blog post [...]

Color in Visualizations: Using it to its Full Communicative Advantage

Color in visualizations of data curation and other data science documentation can be used to [...]

Defaults in PowerPoint: Setting Them Up for Data Visualizations

Defaults in PowerPoint are set up for slides – not data visualizations. Read my blog [...]

Text and Arrows in Dataviz Can Greatly Improve Understanding

Text and arrows in dataviz, if used wisely, can help your audience understand something very [...]

Shapes and Images in Dataviz: Making Choices for Optimal Communication

Shapes and images in dataviz, if chosen wisely, can greatly enhance the communicative value of [...]

Table Editing in R is Easy! Here Are a Few Tricks…

Table editing in R is easier than in SAS, because you can refer to columns, [...]

R for Logistic Regression: Example from Epidemiology and Biostatistics

R for logistic regression in health data analytics is a reasonable choice, if you know [...]

1 Comments

Connecting SAS to Other Applications: Different Strategies

Connecting SAS to other applications is often necessary, and there are many ways to do [...]

Portfolio Project Examples for Independent Data Science Projects

Portfolio project examples are sometimes needed for newbies in data science who are looking to [...]

Project Management Terminology for Public Health Data Scientists

Project management terminology is often used around epidemiologists, biostatisticians, and health data scientists, and it’s [...]

Rapid Application Development Public Health Style

“Rapid application development” (RAD) refers to an approach to designing and developing computer applications. In [...]

Understanding Legacy Data in a Relational World

Understanding legacy data is necessary if you want to analyze datasets that are extracted from [...]

Front-end Decisions Impact Back-end Data (and Your Data Science Experience!)

Front-end decisions are made when applications are designed. They are even made when you design [...]

Reducing Query Cost (and Making Better Use of Your Time)

Reducing query cost is especially important in SAS – but do you know how to [...]

Curated Datasets: Great for Data Science Portfolio Projects!

Curated datasets are useful to know about if you want to do a data science [...]

Statistics Trivia for Data Scientists

Statistics trivia for data scientists will refresh your memory from the courses you’ve taken – [...]

Management Tips for Data Scientists

Management tips for data scientists can be used by anyone – at work and in [...]

REDCap Mess: How it Got There, and How to Clean it Up

REDCap mess happens often in research shops, and it’s an analysis showstopper! Read my blog [...]

GitHub Beginners in Data Science: Here’s an Easy Way to Start!

GitHub beginners – even in data science – often feel intimidated when starting their GitHub [...]

ETL Pipeline Documentation: Here are my Tips and Tricks!

ETL pipeline documentation is great for team communication as well as data stewardship! Read my [...]

Benchmarking Runtime is Different in SAS Compared to Other Programs

Benchmarking runtime is different in SAS compared to other programs, where you have to request [...]

End-to-End AI Pipelines: Can Academics Be Taught How to Do Them?

End-to-end AI pipelines are being created routinely in industry, and one complaint is that academics [...]

Referring to Columns in R by Name Rather than Number has Pros and Cons

Referring to columns in R can be done using both number and field name syntax. [...]

The Paste Command in R is Great for Labels on Plots and Reports

The paste command in R is used to concatenate strings. You can leverage the paste [...]

Coloring Plots in R using Hexadecimal Codes Makes Them Fabulous!

Recoloring plots in R? Want to learn how to use an image to inspire R [...]

Adding Error Bars to ggplot2 Plots Can be Made Easy Through Dataframe Structure

Adding error bars to ggplot2 in R plots is easiest if you include the width [...]

AI on the Edge: What it is, and Data Storage Challenges it Poses

“AI on the edge” was a new term for me that I learned from Marc [...]

Pie Chart ggplot Style is Surprisingly Hard! Here’s How I Did it

Pie chart ggplot style is surprisingly hard to make, mainly because ggplot2 did not give [...]

Time Series Plots in R Using ggplot2 Are Ultimately Customizable

Time series plots in R are totally customizable using the ggplot2 package, and can come [...]

Data Curation Solution to Confusing Options in R Package UpSetR

Data curation solution that I posted recently with my blog post showing how to do [...]

Making Upset Plots with R Package UpSetR Helps Visualize Patterns of Attributes

Making upset plots with R package UpSetR is an easy way to visualize patterns of [...]

4 Comments

Making Box Plots Different Ways is Easy in R!

Making box plots in R affords you many different approaches and features. My blog post [...]

Convert CSV to RDS When Using R for Easier Data Handling

Convert CSV to RDS is what you want to do if you are working with [...]

GPower Case Example Shows How to Calculate and Document Sample Size

GPower case example shows a use-case where we needed to select an outcome measure for [...]

Querying the GHDx Database: Demonstration and Review of Application

Querying the GHDx database is challenging because of its difficult user interface, but mastering it [...]

Variable Names in SAS and R Have Different Restrictions and Rules

Variable names in SAS and R are subject to different “rules and regulations”, and these [...]

Referring to Variables in Processing Data is Different in SAS Compared to R

Referring to variables in processing is different conceptually when thinking about SAS compared to R. [...]

Counting Rows in SAS and R Use Totally Different Strategies

Counting rows in SAS and R is approached differently, because the two programs process data [...]

Native Formats in SAS and R for Data Are Different: Here’s How!

Native formats in SAS and R of data objects have different qualities – and there [...]

SAS-R Integration Example: Transform in R, Analyze in SAS!

Looking for a SAS-R integration example that uses the best of both worlds? I show [...]

Dumbbell Plot for Comparison of Rated Items: Which is Rated More Highly – Harvard or the U of MN?

Want to compare multiple rankings on two competing items – like hotels, restaurants, or colleges? [...]

2 Comments

Data for Meta-analysis Need to be Prepared a Certain Way – Here’s How

Getting data for meta-analysis together can be challenging, so I walk you through the simple [...]

Sort Order, Formats, and Operators: A Tour of The SAS Documentation Page

Get to know three of my favorite SAS documentation pages: the one with sort order, [...]

Confused when Downloading BRFSS Data? Here is a Guide

I use the datasets from the Behavioral Risk Factor Surveillance Survey (BRFSS) to demonstrate in [...]

2 Comments

Doing Surveys? Try my R Likert Plot Data Hack!

I love the Likert package in R, and use it often to visualize data. The [...]

2 Comments

I Used the R Package EpiCurve to Make an Epidemiologic Curve. Here’s How It Turned Out.

With all this talk about “flattening the curve” of the coronavirus, I thought I would [...]

Which Independent Variables Belong in a Regression Equation? We Don’t All Agree, But Here’s What I Do.

During my failed attempt to get a PhD from the University of South Florida, my [...]

Connecting SAS to other applications is often necessary, and there are many ways to do it. Read this blog post for a couple of use-cases of SAS data integration using various SAS components.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Verified by MonsterInsights