Chapter 1: Using SAS in a Data Mart, Data Lake, or Data Warehouse
Every SAS book on data warehousing (or on any other topic) has to start by covering a little history. This is because the main challenge with SAS when working with big data is optimizing data input/output (I/O). This is because originally, SAS had to read data from data cards. Now, that way of reading data remains pretty unique to SAS. Therefore, you really have to understand how SAS is built and functions in order to do a good job managing extract-transform-load (ETL) protocols in a data warehouse.
Chapter 2: Reading Big Data into SAS
Topics covered: Importing datasets into the SAS environment.
What you learn how to do: Read data into SAS in various formats. These include SAS’s sas7bdat and XPT formats, and other more standard formats, like *.csv and *.txt.
Chapter 3: Helpful PROCs for Managing Data
Topics covered: Ways to use SAS to look at data while you are preparing to process it, it so you can make sense of it while writing ETL code.
What you learn how to do: Use PROC CONTENTS with options, use SAS labels and SAS format files, and use PROC PRINT and PROC SQL to view data in SAS.
Chapter 4: Managing ETL in SAS
Topics covered: The kinds of decisions you have to make when setting up a data warehouse to prepare for ETL, along with recommendations on how to make those decisions.
What you learn how to do: Set up a technological environment for the data warehouse, and develop policies for naming conventions, formats, labels, arrays, and data transfers.
Chapter 5: Managing Data Reporting in SAS
Topics covered: What the output delivery system (ODS) in SAS is, and how it is used in PROCs that do reporting.
What you learn how to do: Use the ODS to output data tables produced by PROCs, and use early ODS implementations, like PROC REPORT and PROC TABULATE.
Chapter 6: Standardizing Coding Using SAS Arrays
Topics covered: In order to optimize I/O in a SAS data warehouse, it is necessary to use array processing. This chapter covers how to manage these arrays, and the limitations they impose.
What you learn how to do: Use array processing with conditions to create variables, with a demonstration on how to create an index variable using this approach.
Chapter 7: Designing and Developing ETL Code in SAS
Topics covered: How to actually design a data warehouse, in terms of what variables to develop, and how to standardize serving up those variables through ETL and other protocols.
What you learn how to do: Study the native data you receive from data source providers, design variables for the data warehouse that you think the users will find useful, and develop efficient and well-documented ETL code in SAS to create them.
Chapter 8: Using Macros to Automate ETL in SAS
Topics covered: How to decide whether or not to automate ETL code with SAS macros and macro variables, and if you decide to go ahead with it, how to do it.
What you learn how to do: How to convert working data step code to macro code, and how to store and call macro code separately from the main ETL code.
Chapter 9: Debugging and Troubleshooting in SAS
Topics covered: The trick in SAS to writing code that runs is writing “well-formed” SAS code using a step-by-step process. In this chapter, we go over this, as well as how to debug code that is developed this way.
What you learn how to do: I give tips on debugging well-formed SAS data step code, do loop code, and SAS macros.
Chapter 10: Considering User Needs of SAS Data Warehouses
Topics covered: How to determine the needs of data warehouse users – both analysts and developers – and serve their needs so the data warehouse or data lake successfully meets its business purpose.
What you learn how to do: Create data access policies, manage a data stewardship committee, and serve up foreign keys and crosswalk variables in the data warehouse.
Chapter 11: Connecting the SAS Data Warehouse to Other Systems
Topics covered: How to take data out of a SAS data warehouse and put it in another database environment, and how to connect to another database environment and copy data into a SAS data warehouse.
What you learn how to do: Set up a de-identification strategy for warehouse data, set up a star schema, create an open database connection (ODBC) to a database in another environment (such as MS SQL), and use data steps and PROC SQL to copy data back and forth from SAS to other environments.
Chapter 12: Using the ODS for Visualization in SAS
Topics covered: Today, the ODS is automatically deployed when SAS produces a visualization. We cover how that works in modern implementations.
What you learn how to do: Use complex macros in reporting, use SAS Enterprise Guide as a way to publish an interactive dashboard to the web, use SAS Viya for data profiling, and visualize SAS data in R and Tableau.
Are you a data science author who would like to trade book or course reviews with me?
If so, please e-mail me at [email protected] and let’s discuss it.
Published October 20, 2020. Video added October 21, 2020. FTC disclaimer added December 5, 2020.
SAS is known for big data and data warehousing, but how do you actually design and build a SAS data warehouse or data lake? What datasets do you include? How do you transform them? How do you serve warehouse users? How do you manage your developers? This book has your answers!