Connecting SAS to Other Applications: Different Strategies

Did you know it is possible to integrate SAS with other data environments, like Microsoft SQL or Excel?

Connecting SAS to other applications has had different implications over time. In my book, “Mastering SAS Programming for Data Warehousing”, I describe different ways of connecting SAS to other applications that I have been doing since the early 2000s. Regardless of whether you are using PC SAS or a SAS server, you can typically use the SAS Access component to open up an ODBC connection to another data environment (e.g., MS SQL or Excel).

But connecting SAS components together is kind of like making functional things with Legos, in that you have to put them together to make something that runs and accomplishes data tasks. I’ve found it helpful for SAS users (and non-SAS users, really) to explain not only how you can go about connecting SAS components together, but how you can actually develop a pipeline that works and does something you want with data.

Use Case #1: SAS Enterprise Guide (EG) for Connecting SAS to Other Environments

SAS Enterprise Guide (EG) is a separate application from SAS – and by that, I mean SAS EG versions separately from SAS itself. SAS EG was invented in the 1990s, and so it has kind of an XP look. I made a diagram of the interface that I included in my book that I’ll show you below.

SAS Enterprise Guide allows you to configure pipelines in SAS. You can arrange the windows as you like.

SAS EG Interface

I emphasize that the above is a diagram I drew – not a real screen shot. There are two reasons for this. First, I don’t want to have a copyright violation by taking a screen shot from someone else, because I do not have access to an instance of SAS EG.

But second – and perhaps, more pragmatically – it is very hard to arrange a SAS EG window and make it look good (kind of like trying to arrange an ERD to look good in SQL). As you can see in the diagram, I am showing an example of five windows opened and arranged: Process Flow, Project Tree, Tasks, Results, and Program. If you were really logged into an instance of SAS EG, you’d realize there are many more windows you could open and close and arrange yourself.

SAS EG for Connecting SAS to Other Environments

What you will realize if you actually use SAS EG is that it is actually a way of creating pipelines in SAS to make things more automated. In other words, you first have to have all your SAS code together and a basic plan of the steps you need to carry out in SAS. Then, you can automate it in EG.

Let’s imagine a scenario. Let’s say that you work on an analytics team that studies sales at your company, and you run new models each week. You have a main sales database in MS SQL, but you are an analyst, so you work in the company’s SAS environment. One of the things you do in SAS is make projections of future sales based on past sales – but in order to do that, you need regular weekly data reports from the live MS SQL database. Here is an example of a pipeline you might do manually that you could automate in EG.

You need to map out your data pipeline before you configure it in SAS Enterprise Guide. You will need to drag the objects out and adjust their properties.

Before we walk through this graphic, you have to imagine that you are sitting at your SAS server using SAS EG – which is connected to your SAS data stores already. Now we start the pipeline at the upper left. If you were doing things manually, your first step would be to fire up SAS Access and connect into the MS SQL sales server.

This presumes that you have already worked with the MS SQL server administrators, and together, you have built a SQL view to which your account has access. Then, when you use SAS Access to connect, you hit the view. Let’s say it is parameterized, so you can run the view for a certain time period (e.g., weekly). You need to pass it the parameter – which is easy to do manually if running the MS SQL view, but something you would have to automate if you doing it in EG.

Once you have the view in MS SQL, then in SAS, you’d use data step language or PROC SQL to save the view as a SAS dataset in your SAS environment. I recommend figuring out how to do all this manually first – and even making a diagram like this – before trying to build it in EG. My video below goes through a SAS white paper on the topic.

Use Case #2: Connecting SAS with Data Integration (DI) Studio

SAS Data Integration Studio, or DI Studio for short, is different than EG in that it is mainly for data integration. EG can do a lot of different things, because it basically builds pipelines. By contrast, SAS DI Studio is focused on data integration only, so it is optimized for connecting data stores to each other.

But as usual, SAS is very “SAS-centric” in that it always assumes you are in a SAS environment, and if there are data outside that environment that you need, your goal is to move it into that environment. The two main reasons why SAS typically assumes this with their products is that for one, I/O is very costly when moving data from outside the SAS environment into it. Secondly, SAS is an analytics platform – so theoretically, if you are using SAS, all of your analytics should be taking place within the SAS environment, so you need to start by putting all your data there. You are probably taking copies of data from other systems and moving them into your SAS environment for analysis, so when seen this way, it makes sense that SAS is so SAS-centric.

That is my preamble to the following diagram, where I demonstrate a use-case for SAS DI Studio at a research institute.

If you are using SAS because you are doing research, you probably have an analytics environment. Therefore, you want to move your data into that environment before analyzing it.

In the diagram, the biostatistician is in the SAS environment – along with the live research data on the SAS server, and the SAS DI Studio component. Like EG, DI Studio can set up pipelines – but again, you have to design things manually first.

The diagram imagines the statistician is trying to audit data in a research study. Specifically, the statistician is trying to figure out if all the enrolled participants have all their labs in (from the external SQL lab database in green), and have completed all their follow-up surveys (from the pink Excel dataset). So the statistician could set up pipelines in DI Studio to use SAS Access to go out and connect to the lab data in SQL and the survey data in Excel, pull them into the SAS environment, and compare them with the data in the SAS server.

The video below covers a SAS white paper that goes into more detail about using SAS DI Studio, including how to do bulk loading.

Updated July 29, 2023. Added enterprise guide video September 6, 2023.

Read all of our data science blog posts!

Connecting SAS to other applications is often necessary, and there are many ways to do it. Read this blog post for a couple of use-cases of SAS data integration using various SAS components.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Verified by MonsterInsights