Introduction to SAS and

1 Introduction to SAS andSAS Programs ...
Author: Elvin Wilson
0 downloads 4 Views

1 Introduction to SAS andSAS Programs

2 Agenda In this session, you will learn about: Overview of SASIntroduction to SAS Programming Data Step Processing, Compilation and Execution

3 Overview to SAS Recommended Additional Reading:SAS Statistics by example by Ron Cody

4 12/9/ :42 PM What Is SAS? SAS is a collection of modules that are used to process and analyze data. S STATISTICAL A ANALYSIS S SYSTEM SAS is a highly flexible and integrated software environment that can be used in virtually any setting to access, manipulate, manage, store, analyze, and report on data.

5 12/9/ :42 PM History of SAS The history of SAS dates back to the late ’60s and early ’70s as a statistical package. The Need The Collaborators North Carolina State University (NCSU) had high powerful mainframe computers. Programmers and econometricians of NCSU and seven other universities teamed to develop the software. The Developer The product was created by NCSU student, Anthony J Barr. Software was named Statistical Analysis System (SAS). National Institute of Health (NIH) started a project to analyze the agricultural data of the USA. For this they needed use of sophisticated computers to analyze data.

6 ? History of SAS – Contd. DID YOU KNOW 1976 1990 2013 1980 2010 197212/9/ :42 PM History of SAS – Contd. NIH funding ended but the project continued with the support from members. SAS was rewritten in C language enabling it to run on many different platforms. Introduced High Performance Analytics, Visual Analytics, Text Mining Analytics, etc. 1976 1990 2013 1980 2010 1972 SAS was incorporated as SAS Institute Inc. SAS was reoriented and developed for many vertical industries. NC State University launched the first Master of Science in Analytics Degree Program. SAS 9.4 version was released in July 2013 which has in-memory and in-database processing capability; Massive Parallel Processing (SMP), Cloud Friendly Architecture. DID YOU KNOW ? In its first version in 1976, Base SAS had 3 lakh lines of code written on punch card. Today, it has 170 million lines of code.

7 #1 Market Leader in Analytics “Analytics powerhouse” Why SAS?The largest independent vendor in the business intelligence market Market Leader in Analytics The De facto industry standard for Clinical Data Analysis Used in 70,000+ companies in over 137 countries INTEGRATED PLATFORM FOR END TO END SOLUTIONS: SAS provides an integrated set of software products and services and integrated technologies for information management, advanced analytics and reporting. Business Solutions across Domains and Industries: Unmatched domain specific industry focused analytics solutions “Analytics powerhouse” The Forrester Wave™: Big Data Predictive Analytics Solutions, Q1 2013

8 12/9/ :42 PM SAS Functionalities Information Retrieval and Data Management Report Writing and Graphics Statistical Analysis, Econometrics and Data Mining Business Planning, Forecasting and Decision Support SAS Enables Programmers to Perform the Following Tasks. Operation Research and Project Management Quality Improvement Applications Development Data Warehousing Extract, Transform and Load (ETL)

9 Features of SAS SAS Can Process the Following:12/9/ :42 PM Features of SAS Million of Rows Thousands of Columns Outputs in Various Formats Including CSV, PDF and XML SAS Can Process the Following: Interact ions with Operating System Built in Statistical and Random Number Functions Functions for Character and Number Manipulations Comprehensive Date and Time Handling Functions Interact ions with Database

10 SAS runs on the following platforms:12/9/ :42 PM Platforms that Support SAS SAS runs on the following platforms: Open VMS Alpha IBM Mainframes Unix Microsoft Windows Linux

11 SAS Products 64 Products of SAS are available covering BI, Statistics to Predictive Modeling SAS/ ETS SAS/FSP SAS/GIS SAS/GRAPH SAS/IML SAS/INSIGHT SAS/Integration Technology SAS/LAB SAS/INTERNET SAS/OR SAS/PH-clinical SAS/QC Base SAS BI Dashboard Data Integration Studio SAS Enterprise Business Intelligence Server Enterprise Computing Offer Enterprise Guide Enterprise Miner Information Delivery Portal Information Map Studio SAS Web OLAP viewer for Java SAS/SHARE SAS/SHARE NET SAS/SPECTRA VIEW SAS/STAT SAS/TOOLKIT SAS/WAREHOUSE ADMINISTRATOR SAS/WEB REPORT STUDIO SAS/FINANCIAL MANAGEMENT SAS/ STRATEGY MANAGEMENT SAS/ACCESS SAS/WEB OLAP VIEWER FOR NET Base Access for PC Files SAS Add in for Microsoft Files SAS/AF SAS/SCL SAS/ASSIST SAS/CALC SAS/CONNECT SAS/DMI SAS Web OLAP viewer for Java

12 Applications of SAS SAS has various business applications And Many12/9/ :42 PM Applications of SAS And Many More… Business Intelligence Financial Management Text Mining Human Resource Management SAS has various business applications Money Laundering Customer Relationship Management Clinical Data

13 Components of the SAS SystemReporting and Graphics Data Access and Management User Interfaces Application Development Analytical Base SAS Visualization and Discovery Business Solutions Web Enablement

14 How Data is Turned into InformationAccess Manage Raw Facts & Figures Contextualized Data Analyze Present PROCESS DATA Step SAS Data Sets PROC Steps

15 How Data is Turned into InformationACCESS [20%] In this step you will perform the role of a developer of analysis datasets. You will be provided with the raw datasets based on which the analysis datasets will need to be created. MANAGE [50%] In this step you will perform the data validation and cleaning and create the master dataset ( Which includes merging, concatenating , appending, sorting, summarizing which can be used for further analysis. PRESENT [10%] In this step, you will take data from a source such as a database, SAS dataset or a spreadsheet and use it to produce a document in a format which satisfies a particular human readership (Graphs, Reports, etc). ANALYZE [20%] In this step we analyze data to capture insights , which helps to take the strategic decision.

16 Architecture of the SAS SystemMulti Vendor Architecture PC Workstation Servers/ Midrange Mainframe Super Computer

17 Architecture of the SAS SystemMulti Engine Architecture DB2 SAP Teradata DATA ORACLE SYBASE Microsoft Excel

18 4 2 SAS Processing Modes Processing modes CategoriesBackground Processing 4 2 Processing modes Categories Foreground Processing Batch Mode Background Processing SAS Window Environment Interactive Line Mode Non-interactive Mode Foreground Processing If the processing requirements change, the user might find it helpful to change from one processing mode to another.

19 Foreground ProcessingSAS Processing Modes Foreground Processing Foreground processing begins immediately, but as the program runs, the current workstation session is occupied. Until the running process ends, the session can not be used for other work. The user can route the output to the workstation display, to a file, to a printer, or to tape. Learning SAS programming Testing a program to see if it works Best Suited for  Fast turnaround Processing a fairly small data file

20 Background ProcessingSAS Processing Modes Background Processing Batch processing is the only way to run SAS in the background. The operating environment coordinates all the work, so the workstation session could be used to do other work while the program is running. The program may have to wait in the input queue before it is executed. An experienced SAS user, likely to make fewer errors than a novice Running a program that has already been tested and refined Best Suited for Program which requires a longer time Processing a large data file

21 Processing in the SAS Window EnvironmentThe SAS window environment is an interactive and graphical user interface (GUI) that consists of a series of windows (Panels). Explorer/ Result Panel Manage the files Program Editor Panel Edit and execute programs Output Panel View program output Log Panel View program logs and the SAS session The SAS windowing environment is the default environment for a SAS session. Only single session could be used to prepare and submit a program and, if necessary, to modify and resubmit the program after browsing the output and logs.

22 Processing in the SAS Window EnvironmentResults and Explorer Windows open when SAS is started and are a hierarchical system of folders, subfolders, and individual items. It provides a primary graphical interface to SAS which could be used for: Access and work with data, such as catalogs, tables, libraries, and operating environment files (Data/File Management) Open SAS programming windows Access the Output Delivery System (ODS) Create and define customized folders Open and edit SAS files View or set libraries and file shortcuts, view or set library members and catalog entries. The Results Window helps to manage the contents of the Output window.

23 SAS Environment WindowsIn the SAS windowing environment, you submit and view the results of a SAS program using three primary windows. contains the SAS program to submit. contains information about the processing of the SAS program, including any warning and error messages. contains reports generated by the SAS program.

24 SAS Environment Editor WindowsThe SAS Environment Editor Window is used for writing SAS programs. All components of SAS program - data manipulation steps, procedures to conduct analysis, comments or titles; should be written here. SAS programs are saved as .sas file format. To run a program – Click Submit button or press F3 Parts of code can be selected and compiled.

25 SAS Environment Editor WindowsEnhanced Editor EXAMPLE Program Editor Here is an example of a SAS program in each of the editor windows. Since there is only one program editor, it will always say program editor on the title bar. You can have more than one enhanced editors. So if your program has been saved to a file, you will see the name of the file but not the words enhanced editor. Notice some of the other differences.

26 SAS Environment Log WindowsBrowse through the SAS Log which is generated after the code has been compiled. To check if the code has run correctly or not (syntactically). Notes, warnings and errors are shown here. The text in log cannot be altered.

27 SAS Environment Output WindowsOutput of the SAS program displayed here and uses .lst extension to save the file. Check output to see if results from the program are as expected or not. Some options can be used to not display output here (ex. noprint) or direct the output to other files (ex. ODS).

28 Demonstration and ActivityRecommended Additional Reading: Learning SAS by Example by Ron Cody

29 Additional Window EnvironmentsAccess Online Help Additional Windows Environments Enable You To…. Create and store text information View and change some SAS system options   View and change function key settings

30 What is a SAS Program? A SAS program is a sequence of steps that the user submits for execution. Raw Data SAS Data Set DATA Step PROC Step Report SAS Data Set DATA Step Are typically used to create and update SAS data sets. PROC Step Are typically used to process SAS data sets (i.e. sort data, manage data, generate reports and graphs and data analysis).

31 You can invoke SAS in the three ways.Invoking SAS Programs BATCH MODE INTERACTIVE MODE NON-INTERACTIVE MODE You can invoke SAS in the three ways.

32 Invoking SAS Programs Invoking in SAS Window EnvironmentTo invoke the SAS window environment, execute the SAS command followed by window option “DMSEXP” to activate an window environment Processing Interactively in Line Mode With line mode processing, the programming statements are entered one line at a time; DATA and PROC steps are executed after a RUN statement is entered, or after another step boundary. Program messages and output appear on the monitor.

33 Invoking SAS Programs Invoking SAS in Line ModeTo invoke SAS in line mode, execute the SAS command followed by window options the NODMS to activate an interactive line mode session. Processing in Batch Mode The first step in executing a program in batch mode is to prepare files that include:   Any control language statements that are required by the operating environment   The SAS statements necessary to execute the program The program file should be submitted to the operating environment. The log and output are routed to the destination that you specify in the operating environment control language; without a specification, they are routed to the default.

34 Invoking SAS Programs Processing Non-interactivelyNon-interactive processing has some characteristics of interactive processing and some of batch processing. When the process is run non-interactively, the SAS program statements that are stored in an external file is executed. Use a SAS command to submit the program statements to the operating environment. Executing a Program in Non-interactive Mode For run a program in Non-interactive mode, there is no need to enter a SAS session as it is required in interactive mode; instead of starting a SAS session, a SAS program is executed. For running the program, at the system prompt, specify the SAS command followed by the complete name of the file and the system options.

35 SAS Program - Sample CodeSAS is a sequential programming language which has steps. Steps are a sequence of SAS statements. Data WORK.Empolyee; infile 'C:\Work\Training\SAS\BASE SAS\Prog1\Project\Employees.csv' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ; informat EmployeeID best32. ; informat LastName $29. ; informat FirstName $28. ; informat City $28. ; format EmployeeID best12. ; format LastName $29. ; format FirstName $28. ; format City $28. ; input EmployeeID LastName $ FirstName $ City $; Run; Proc print data=Employee (obs=15); Var EmployeeID FirstName City ; 35

36 SAS Program – StatementsAll SAS statements begin with SAS keywords. You can invoke SAS in the three ways. Data WORK.Employee; infile 'C:\Work\Training\SAS\BASE SAS\Prog1\Project\Employees.csv' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ; input EmployeeID LastName $ FirstName $ City $; Run; Proc print data=Employee (obs=15); Var EmployeeID FirstName City ; All SAS statements end with semicolons. ! This is an important rule because if you leave out a semicolon where one is needed, the program may not run correctly, resulting in hard-to-interpret error messages.

37 Demonstration and ActivityRecommended Additional Reading: SAS Statistics by Example by Ron Cody

38 SAS Program – Syntax RulesCode can begin in any column and can span several lines. It should end with a semicolon. It is not case sensitive. SAS statements are free-format. One or more blanks or special characters can be used to separate words. It can begin and end in any column. SAS Syntax Rules A single statement can span multiple lines. Several statements can be on the same line.

39 SAS Program – Comment StatementThere are two ways to add a comment to your SAS program. Start it with a slash star (/*) and end it with a star slash (*/). 1 Start it with a * and ;. 2 /* comment */ * comment ; You may even embed comments of this type within a SAS statement. EXAMPLE /* input Gender $ Age Height Weight;*/ *input Gender $ Age age is in years Height Weight;

40 SAS Program – Comment Exampledate WORK.Empolyee; infile 'C:\Work\Training\SAS\BASE SAS\Prog1\Project\Employees.csv‘ delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ; informat EmployeeID best32. ; nformat LastName $29. ; informat FirstName $28. ; informat City $28. ; format EmployeeID best12. ; format LastName $29. ; format FirstName $28. ; format City $28. ; input EmployeeID LastName $ FirstName $ City $; run; This program contains four comments. The first comment is extra text at the beginning of the program. The second comment is within a statement. The third comment is commenting out a step. The fourth comment is commenting out a statement.

41 Demonstration and ActivityRecommended Additional Reading: SAS Statistics by Example by Ron Cody

42 SAS Program – Other EssentialsTITLE SYSTEM OPTIONS Create title in output Example – TITLE “Add title here!”; Commands that affect SAS session, processing and outputs Example – options obs = Other Essentials of SAS Programming

43 SAS Program – Step BoundariesSTART SAS steps begin with either of the following: DATA Step PROC Step Are typically used to create and update SAS data sets. Are typically used to process SAS data sets STOP SAS detects the end of a step when it encounters: A RUN statement (for most steps) A QUIT statement (for some procedures) The beginning of another step (DATA statement or PROC statement)

44 ! Common Error Messages in SASWhen a SAS program is executed, SAS generates a log. The log contains three types of messages. Notes and Warnings The Error Strategies to Find and Correct the Errors

45 Strategies for Finding and Correcting ErrorsCommon Error Messages in SAS Notes and Warnings Although notes and warnings do not cause the program to terminate, they are worthy of your attention. The Error It indicates that the program has failed and stopped execution. Strategies for Finding and Correcting Errors Step I: Start at the beginning: Do not become alarmed if your program has several errors in it. Sometimes there is a single error in the beginning of the program that causes the others. Step II: Debug your programs one step at a time: SAS executes programs in steps, so even if you have an error in a step written in the beginning of your program, SAS will try to execute all subsequent steps Correct your programs one step at a time, before proceeding to the next step.

46 Semicolon Missing Wrong Data Type Misspelling ErrorSyntax Errors Syntax errors occur when program statements do not follow the rules of the SAS language. They are: Semicolon Missing Wrong Data Type Misspelling Error

47 ! SAS Errors - Missing SemicolonA missing semicolon causes SAS to misinterpret not only the statement where the semicolon is missing, but possibly several statements that follows: proc print data = auto44 var make mpg; run;  ERROR : The option or parameter is not recognized. NOTE: The SAS System stopped processing this step because of errors. ! The syntax for the following program is absolutely correct, except for the missing semicolon on the comment: The missing semicolon causes SAS to read the two statements as a single statement. As a result, the var statement is read as an option to the procedure. Since there is no var option in proc print, the program fails.

48 SAS Errors - MisspellingsSometimes SAS corrects the spelling mistakes by making its best guess at what you meant to do and SAS continues execution and issues a warning explaining the assumption it has made. Note that the word "DATA" is misspelled. If we were to run this program, SAS would correct the spelling and run the program but issue a warning. DAT auto ; INPUT make $ mpg rep78 weight foreign ; CARDS;AMC AMC AMC ;run; data auto2; set auto; ratio = mpg/wieght; run; NOTE: Variable WIEGHT is uninitialized. NOTE: Missing values were generated as a result of performing an operation on missing values Each place is given by: (Number of times) at (Line):(Column). 6 at 77:15 NOTE: The data set WORK.AUTO2 has 26 observations and 7 variables. When the program contains spelling errors, the step will terminate and SAS will issue an error statement or a note underlining the word, or words, it does not recognize.

49 Misspellings Sometimes SAS corrects the spelling mistakes by making its best guess at what you meant to do and SAS continues execution and issue a warning explaining the assumption it has made. DAT auto ; INPUT make $ mpg rep78 weight foreign ; CARDS;AMC AMC AMC ;run; Note that the word "DATA" is misspelled. If we were to run this program, SAS would correct the spelling and run the program but issue a warning. data auto2; set auto; ratio = mpg/wieght; run; NOTE: Variable WIEGHT is uninitialized. NOTE: Missing values were generated as a result of performing an operation on missing values Each place is given by: (Number of times) at (Line):(Column). 6 at 77:15 NOTE: The data set WORK.AUTO2 has 26 observations and 7 variables. When the program contains spelling errors, the step will terminate and SAS will issue an error statement or a note underlining the word, or words, it does not recognize.

50 SAS Errors - Misspellings Errorproc print66 var make mpg weight; run; ERROR : Syntax error, statement will be ignored. NOTE: The SAS System stopped processing this step because of errors. In this example, there is nothing wrong with the var statement. Adding a semicolon to the proc print solves the problem. proc print; var make mpg weight;run; Wrong Data Type data test; input a b; cards; john 1megan 2;

51 Demonstration and ActivityRecommended Additional Reading: SAS Statistics by Example by Ron Cody

52 SAS Errors - Wrong Data Typeproc print data = test;run; Obs a b . 2 data test; input a b;2310 cards; NOTE: Invalid data for a in line RULE: john 1a=. b=1 _ERROR_=1 _N_=1 NOTE: Invalid data for a in line megan 2a=. b=2 _ERROR_=1 _N_=2 NOTE: The data set WORK.TEST has 2 observations and 2 variables. NOTE: DATA statement used (Total process time): real time seconds cpu time seconds2313 ;2314 run; proc print data = test;2316 run; NOTE: There were 2 observations read from the data set WORK.TEST.NOTE: PROCEDURE PRINT used (Total process time): real time seconds cpu time seconds Indeed, there are no error messages in red. But each NOTE offers some detailed information. The first NOTE says that the data for variable "a" is invalid in line 2311 position 1-4. Since line 2310 is the line corresponding to the statement "cards;", line 2311 corresponds to the first line of data which starts with input john. So the NOTE is basically saying that "john" is not a valid numeric value. in this case, we just need to add a dollar sign after variable "a"  in the input statement as shown below.

53 SAS Program – Syntax ErrorsSyntax errors occur when program statements do not follow the rules of the SAS language. Examples include: Misspelled Keywords Mismatched Quotation Marks Missing Semicolons Invalid Options When SAS encounters a syntax error, SAS prints a warning or an error message to the log. ERROR : Syntax error, expecting one of the following: a name, a quoted string, (, /, ;, _DATA_, _LAST_, _NULL_.

54 SAS Program – Syntax ErrorsSAS organizes data into a rectangular form or table that is called a SAS data set Each row of represents information about an individual entity and is called an observation Each column represents the same type of information and is called a variable. DATA Step used to create and modify datasets SAS dataset stored in .sas7bdat file format Variable names Row - Observation Column – Variable Values

55 Demonstration and ActivityRecommended Additional Reading: SAS Statistics by Example by Ron Cody

56 Contact Thank you [email protected]Mumbai | Pune | Bangalore | Delhi - NCR | Hyderabad | Chennai | Coimbatore ACCREDITED TRAINING PARTNER: