1/184
Flashcards for EPI514 Midterm Midterm Day: 10/18/2023 Lecture 1 - Lecture 10
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Labels
replace the variable name in the output
Adding Labels to Variables
makes the output significantly more readable
LABEL statement info:
Use a LABEL statement to assign a label to each variable.
After the keyword LABEL, enter a variable name followed by an equal sign and the desired label in quotes.
If the label is made in PROC step, it will only be used for that procedure.
If the label is made in DATA step, it will be applied permanently.
Formats
are used to change how the data is displayed.
PROC FORMAT
allows for creating user-defined formats
VALUE statement
used to define format
PROC FORMAT
use it to regroup values together
Format Library
If formats and labels are made in a DATA step, they are permanently associated with the data set
If the data set is saved permanently as a SAS data set with user-defined formats, you must store formats permanently as well
LIBNAME statement
use to create a libref (will create permanent data sets)
LIBRARY option in PROC FORMAT
use to save format library
OPTIONS statement
use to change SAS system options
FMTLIB option
use in PROC FORMAT to display the definitions of all formats in the library
SELECT statement
use to view only specific format
Reading Data from Excel into SAS
The IMPORT procedure can be used to convert Excel, STATA, SPSS, JMP, Microsoft Access, and other software’s data files into SAS data sets
PROC IMPORT
PROC IMPORT will scan a worksheet and determine variable types and lengths of character variables automatically
DATAFILE option used to specify the file you want to read
OUT option used to specify the name of the SAS data set you are
creating
DBMS option tells SAS the type of data to import, e.g., XLS, XLSX,
JMP, ACCESS, CSV
REPLACE tells SAS to replace the SAS data set if data set named in OUT option already exists
Different from DATA step which overwrites data set
Accessing Excel File Using Engine
Can use the XLSX LIBNAME engine to read and write to an Excel
file without converting it to a SAS data set
Can read an existing worksheet, replace a worksheet, or add a new
worksheet
Cannot change individual values in a worksheet
XLSX Libname Engine
will use the first line of the data set as the
variable name, determine each variable’s data type, assign a length
to character variables, and recognize dates and numeric variables
with dollar signs and commas
Uses LIBNAME statement to access excel file
As before a libref only remains associated with a SAS library for the
duration of the SAS session or until it is changed with another
statement
SAS Output Delivery System
Can use the SAS Output Delivery System (ODS) to create a CSV file from a SAS data set
Conditional Processing
Conditional processing allows program to make logical decisions based on data values
We looked at using IF statements previously to print observations
that satisfied certain conditions using the PUT statement
Useful for creating new variables and subsetting data
IF and ELSE IF Statements
The expression following an IF statement is evaluated and if the
expression is true the statement following THEN is executed
An ELSE statement can follow an IF/THEN statement and provides
an expression to evaluate if the IF statement is false
ELSE IF follows an IF/THEN statement and is used to provide an
additional IF statement to evaluate if the prior IF statement is false
If working with large data sets that requires the code to be as efficient as possible, place the if/else if statements most likely to be true first
Comparison Operators in SAS
Many different comparison operators in SAS
Can be expressed as a symbol or mnemonic equivalent
>= and <= are not supported in WHERE clauses or PROC SQL
Subsetting with IF Statement
We can use an IF statement to subset data
Place an IF statement in a DATA step without a THEN statement
If the condition is true, the DATA step continues
If the condition is false, the DATA step returns to the top and
proceeds with the next observation
In Operator
IN checks if a value is contained in a list of of values
Can also use IN with numerics
SELECT Statement (in place of IF/ELSE)
SELECT statement is an alternative to a series of IF and ELSE IF
statements
Select-expression follows the select statement
When-expression comes after select
Select-expression used to specify which variable we want to compare
to each of the when-expressions
If variable in select-expression is equal to the value in the
when-expression then the expression after WHEN is executed and we
skip to the END statement
If none of the comparisons are true the otherwise-expression after
OTHERWISE is executed
SELECT Statement (in place of IF/ELSE) not specify
We can also not specify a select-expression in the SELECT statement
Then each when-expression is evaluated until one is true and the
statement after is executed
As before, if all are false then otherwise-expression is executed
Boolean Operators
Can combine multiple comparisons using logical operators, also called boolean operators
NOT evaluated first, then AND, then OR
Let X, Y, and Z represent comparison statements
IF X AND Y OR Z; equivalent to IF (X AND Y) OR Z;
IF X AND NOT Y or z; equivalent to IF (X AND (NOT Y)) OR Z;
WHERE Statement
WHERE statements are very similar to IF statements, but they can
only be used for SAS data sets
Where Statement (List of Operators)
WHERE statements have a list of operators that cannot be used with
IF statements
Examples of Operators (Part 1)
IS MISSING
WHERE Age IS MISSING;
IS NULL
WHERE Age IS NULL;
BETWEEN AND
WHERE Age BETWEEN 20 AND 40;
CONTAINS
CONTAINS is case sensitive
WHERE Name CONTAINS ’er’;
Selects observations with name ’Jefferson’ and ’Peter’, but not ’Eric’
Examples of Operators (Part 2)
LIKE
LIKE is case sensitive
_can be any single character
% can be a string of any length
WHERE Name LIKE ’J s%’;
Would select Justin, Josh, Jessica
= ∗
Uses Soundex algorithm to compare whether the word sounds like
WHERE name =* ’Smith’;
Selects observations with names Smitt, Smythe, but not Schmitt
Looping
It is common to have a set of SAS statements that we want to execute multiple times
This can be accomplished in SAS using DO groups, DO loops, DO WHILE statements, and DO UNTIL statements
DO Groups
DO groups are often used with IF/ELSE statements when we have a group of statements that we want executed when the IF condition is TRUE
SUM Statement
The SUM statement adds the results of an expression to an accumulator variable
SUM statement has the following form
variable + expression
Variable is set to 0 initially
Variable is retained automatically
Missing values are ignored
Note there is no equal sign
RETAIN Statement
RETAIN statement causes a variable that is created by an INPUT or assignment statement to retain its value from one iteration of the DATA step to the next
SUM Statement (counter)
SUM statement is commonly used for creating counters
DO Loops
There are times where we want to execute the same code multiple times
DO Loops (plotting)
We can use a DO loop to increment X over a grid of points and calculate Y for each value of X
Can use an OUTPUT statement to create a data set that contains all of the (X, Y ) pairs
PROC SGPLOT can be used to plot the line using the data set
DO WHILE statements
DO WHILE statements execute a block of statements repeatedly while a condition is true
For DO WHILE the condition is evaluated at the beginning of the loop
For DO WHILE, we need to be sure that the condition will eventually stop being true
Otherwise will create an infinite loop that will never stop running Can press cancel to stop the program
DO UNTIL Statements
DO UNTIL statements execute a block of statements repeatedly until a condition is met
For DO UNTIL the condition, placed in parentheses after UNTIL, is evaluated at the bottom of the loop
Therefore the code is always executed at least once
Need to be careful when using DO UNTIL statements that the condition eventually becomes true
Otherwise will create an infinite loop that will never stop running Can press cancel to stop the program
Combining DO UNTIL/WHILE and DO Loops
We can use a DO loop that loops over an index variable while also containing an UNTIL statement that will stop the loop as soon as the condition becomes true
Also possible to combine a DO Loop with a DO WHILE statement
Then there is no possibility of an infinite loop as the loop will stop when the index variable has iterated through all of its values
LEAVE Statement
The LEAVE statement inside a DO loop ends the loop and moves to executing the statement after the END statement
LEAVE statement can also be used inside a SELECT group
CONTINUE Statement
The CONTINUE statement ends the current loop and moves to the next iteration of the loop and continues
SAS Dates
Dates are stored as the number of days from January 1, 1960
YRDIF function
returns the number of years between two dates given by the start date (DOB) and end date (Date2)
Date Constant
Suppose we want to calculate the age for everyone in the data set at a specific date
Can enter dates in a DATA step using a date constant
General form is one- or two- digit day, three character month, and two- or four-digit year in quotation marks followed by a d
For example, a date can be written as
‘28Sep2022’d
This is the only form allowed for a date constant
Date constants can be used in any expression involving dates
Date Constant for Today
Can use the TODAY function to return today’s date
DATE() is identical to TODAY()
DATDIF Function
DATDIF function returns the number of days between two dates
DATDIF function has a required third argument, basis
DATDIF(start-date, end-date, basis)
Basis used to specify how we want to count the number of days
’30/360’ uses a 30-day month and 360-day year regardless of the actual number of days in a month or year
’ACT/ACT’ uses the actual number of days
’ACT/360’ uses the actual number of days in a specific month and a 360-day year
‘ACT/365’ uses the actual number of days in a specific month and a 365-day year
YRDIF function has a similar third argument that changes how the difference in years is calculated
Default is ‘AGE’ for calculating a person’s age
Extracting Day, Month, and Year from a SAS Date
WEEKDAY function returns the day of the week with Sunday = 1
DAY function returns the day of the month
MONTH function returns the month
YEAR function returns the four-digit year value
Creating a Date from Month, Day, Year
MDY function allows you to create a SAS date using a month, day, and year values
Any missing value results in a missing date
Date Interval Function
INTCK function computes the number of interval boundaries (e.g., months, quarters, years) that are crossed between two dates
INTNX function computes a date after a given number of intervals
Partial List of Intervals for INTCK/INTNX
A partial list of the intervals used for INTCK and INTNX given by
By default SAS assumes that data values are separated by ( )
one or more blanks
The input statement contains
the variables you want to associate with each data value
A period must be separated from other values by ( )
at least one space
A common way to store data on Windows and UNIX platforms is in ( ), which use commas instead of blanks as data delimiters
Comma- separated values
( ) is a sequence of one or more characteristics that marks the beginning or end of a unit of data
delimiter
( ) after the file name is an option for infile
delimiter-sensitive data
As an alternative file reference, you may use ( ) to specify file containing data.
filename
( ) overrides DSD when used together
Delimiter specified by DLM overrides DSD when used together
For TAB key delimiter need to use the ( )
hexadecimal equivalent
Hexadecimal is a numeral system with base ( )
Standard numeral system is base ( )
Uses 16 distinct symbols for each digit, commonly with ( ) representing 0-9 and ( ) representing 10-15
Used by software developers as ( ) which range from 00000000 to 11111111 can be written as hexadecimal number from ( ) to ( )
16
10
0-9
A-F
8 bit
00 to FF
Hexadecimal values can by used in SAS statement by placing ( ) followed immediately by ( )
value in quotes followed immediately by x (no spaces)
Can use ( ) to read data directly into SAS without using an external file
datalines
You can use DATALINES with ( ) and use DATALINES as file reference in ( )
INFILE options and INFILE statement
A method for reading data in fixed columns is called
column input
Column data cannot read in data that ( ) or ( ) and can only read in dates as
has no commas or dollar signs in it and can only read in dates as character values
Formatted input reads data from fixed columns similarly to column input, but allows for ( ) (e.g. containing dollar signs or commas) and ( ) in a variety of formats
nonstandard numerical values and dates in a variety of formats
( ) sign is called a column pointer and tells which column the variable starts at
@ sign is called a column pointer and tells which column the variable starts at
( ) tells SAS that its a character and w columns
mmddyy10. tells SAS the date is in the ( ) format
Storing the date as numeric as the number of days from January 1, 1960
$w. informat tells SAS that its a character and w columns mmddyy10. tells SAS the date is in the mm/dd/yyyy format
Can use ( ) in PROC PRINT to change how values are displayed
FORMAT statement
What does dollar 11.2 indicate to do?
use dollar format, with up to 11 columns and 2 decimal places
For informats a ( ) tells SAS to use specified informat but to stop reading value at delimiter
colon
May place INFORMAT statement before ( ) statement in the DATA step
Can apply informat to ( ) by listing multiple variables with a ( )
Can also place INFORMAT statement before the INPUT statement in the DATA step
Can apply informat to multiple variables by listing multiple variables with a single informat after
( ) statement tells SAS where to find the data
INFILE STATEMENT
The order of the values in input match the order of the values in the file
T or F
True
A missing numeric value is represented by a ( ) in SAS
single period
In a CSV file each line of the file gives ( ) observation, each unit of data is ( ) and character values may ( )
one observation, separated by a comma and each character value may not be in quotes
DSD changes the ( ) and assumes ( ) delimiters and removes ( )
The default delimiter to a comma, it assumes two delimiters in a row is a missing value, and it removes quotes (single or double) from character values
( ) allows you to specify the delimiter used in the file; you may also do this ( )
DLM option to INFILE statement; you may also spell out DELIMITER instead of DLM
Example of a hexadecimal in SAS
infile 'file name' dlm= '09'x
You can read data directly into SAS without using an external file, True or False?
True
( ) statement is equivalent to DATALINES
CARDS statement
( ) must be last statement in DATA step
DATALINES
Column input can read ( ) data and ( )
This method of input can read character data and standard numeric value
After variable name is SAS ( ) which tells SAS how to read a data value
After variable name is SAS informat which tells SAS how to read a data value
( ) is used for numerics
( ) value tells SAS how many columns to read
( ) is optional and tells SAS the number of digits to the right of the decimal point
If a decimal point is already in it (decimal counts as ( ) ) then ( ) is ignored
w.d informat is used for numerics
w value tells SAS how many columns to read
d is optional and tells SAS the number of digits to the right of the decimal point
If decimal point already in it (decimal counts as column) then d is ignored
All SAS formats end in either a ( ) or a ( ) followed by a ( )
All SAS formats end in either a period or a period followed by a number
FORMAT statement allow SAS to ( ) from variable or data set names
format name
FORMAT STATEMENT changes how the values are displayed in output and how they are stored. T or F?
F, only changes how the values in output are displayed but does not change how the variables are stored
Can also place format statement in the ( a ) step instead of PROC
If placed in ( a ) step, that format will be ( b) associated with the variable
Can override format in a particular PROC by including a ( c) statement
Can write FORMAT statement with ( ) to remove a format (format variablename;)
Can also place format statement in the DATA step instead of PROC
If placed in DATA step, that format will be permanently associated with the variable
Can override format in a particular PROC by including a FORMAT statement
Can write FORMAT statement with no format specified to remove a format (format variablename;)
Can also specify formats when reading data using ( )
Can also specify formats when reading data using list input
Need informat for a name if it is over ( )
8 bytes
Need informat for date if you want it stored as ( ) and not a ( )
numeric and not a character
list.txt contains the same data, ( ) as delimiters and ( ) around character values
list.txt contains the same data, but with blank spaces as delimiters and no quotes around character values
SAS is a collection of modules
collection of modules that are used to process and analyze data
SAS is the main software used in which industry
pharmaceutical industry
SAS was developed at which university? And what was it made for
North Carolina State University to create a statistical analysis system to analyze agricultural data in the late 60s
DATA Steps do what?
Read and modify data, create SAS data set