1/72
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
operator for filtering: “does not equal”
^= or ~= or ne
SAS date constant (to use when filtering)
“ddmmmyyy”d
format: w.d
w: width of output field
d: number of digits to right of decimal point
format: COMMAw.d
writes numeric values with comma(s) and decimal
format: DOLLARw.d
writes numeric values with leading dollar sign, comma(s), and decimal
*width includes dollar sign character
format: DATE7.
ex. March 15, 2018
ddmmmyy
ex. 15MAR18
format: DATE9
ex. March 15, 2018
ddmmmyyyy
ex. 15MAR2018
format: MMDDYYw.
ex. May 2, 2018 using MMDDYY10.
mmdd<yy>yy or mm/dd/<yy>yy (all integers)
ex. MMDDYY10. —> 05/02/2018
format: DDMMYYw.
ex. May 15, 2018 using DDMMYY8.
ddmm<yy>yy or dd/mm/<yy>yy (all integers)
ex. DDMMYY8. —> 15/05/18
PUTLOG _ALL_
Writes the contents of all the variables of the PDV to the SAS log
PUTLOG column=
Writes a single column value to the SAS log
PUTLOG “message”
Writes a custom message to the SAS log
PUTLOG “NOTE: message”
Displays the keyword NOTE followed by your custom log message in blue text in the SAS log
implicit OUTPUT
By default, the end of a DATA step causes an implicit OUTPUT, which writes the contents of the PDV to the output table
explicit OUTPUT
controls when and where each row is written
disables implicit OUTPUT at end of DATA step
syntax for sending output to multiple tables
DATA table1 <table2 ...>;
OUTPUT table1 <table2 ...>;
syntax for controlling column output to multiple tables
in the DATA statement:
table (DROP=col1 col2…);
table (KEEP=col1 col2…);
compile-time only statements
DROP, LENGTH, WHERE, KEEP, FORMAT
how to control which columns are read into the PDV in the first place
DROP= or KEEP= options added in the SET statement
*if a column is not read into the PDV, it’s not available for processing during the DATA step
why does the default DATA step need to be modified to create an accumulating column?
by default, all computed columns are reset to missing at the beginning of each subsequent iteration of the DATA step
syntax for retaining values in PDV to create accumulating columns
RETAIN column <initial-value>;
column=column+expression;
OR column=sum(column, column2)
keeps values, not set to missing
can set initial value before execution
sum function ignores missing values
syntax for using the sum statement to create accumulating columns
column+expression;
initial value = 0
ignores missing values
syntax/method for processing data in groups
1) use PROC SORT to sort the grouping column(s)
2) the BY statement in the DATA step indicates how the data has been grouped
3) can use if-then logic using first.bycol and last.bycol
syntax for subsetting if statement used when processing data in groups
IF expression;
affects which rows are written to output from PDV
if true, remaining statements are executed (including explicit or implicit OUTPUT)—otherwise, move to next iteration of DATA step
syntax for calculating the mean of multiple columns
mean(Quiz1, Quiz2, Quiz3)
mean(of Quiz1-Quiz3)
Numbered Range: col1-coln where n is a sequential number
mean(of Q:)
Name Prefix: all columns that begin with the specified character string
col--col
physical range of columns in a table
_NUMERIC_
specifies all numeric columns in table
_CHARACTER_
specifies all character columns in table
_ALL_
specifies all columns in table
syntax for call routine
CALL routine(argument-1 <, ...argument-n>);
alters column values (does not return column values)
syntax for generating random numbers
RAND('distribution', parameter1, ...parameterk)
for random uniform, distribution=’INTEGER’
parameter1 and parameter 2 are lower and upper limits
syntax for returning the largest nonmissing value
LARGEST(k, value-1 <, value-2 ...>)
k=the kth biggest value
value-1, value-2, etc. are the numbers to evaluate
syntax for rounding
ROUND(number <, rounding-unit>)
function that returns the smallest integer that is greater than or equal to the argument
CEIL (number)
function that returns the largest integer that is less than or equal to the argument
FLOOR (number)
function that returns the integer value
INT (number)
what’s the difference between using a function and a format?
a function actually changes the stored values while formatting only affects how it’s displayed
syntax for extracting date component of a datetime value
DATEPART(datetime-value)
syntax for extracting time component of a datetime value
TIMEPART(datetime-value)
SAS time
number of seconds from midnight
syntax for counting the number of intervals between a start and end date
INTCK('interval',start-date,end-date <,'method'>)
default method is discrete
‘C’ to specify continuous method
syntax for shifting date values
INTNX('interval',start,increment <,'alignment'>)
function that returns a character string with all multiple blanks in the source string converted to single blanks
COMPBL(string)
function that returns a character string with specified characters removed from the source string
COMPRESS (string <, characters>)
function that returns a character string with leading and trailing blanks removed
STRIP(string)
function that returns the nth word in a string
SCAN(string, n <, 'delimiters'>)
default delimiters: blank ! $ % & ( ) * + , - . / ; < ^ |
optional third arg: specify delimiter list
function that converts to proper case
PROPCASE(string <, 'delimiters'>)
default delimiters: blank / - ( . tab
optional second argument: specify delimiter list
function that searches for a specific substring of characters within a character string
FIND(string, substring <, 'modifiers'>)
returns index of where the substring starts
function that returns the length of a non-blank character string, excluding trailing blanks, returns 1 for a completely blank string
LENGTH(string)
function that returns the first position at which a digit is found in the string
ANYDIGIT(string)
function that returns the first position at which an alpha character is found in the string
ANYALPHA(string)
function that returns the first position at which punctuation character is found in the string
ANYPUNCT(string)
function that replaces all occurrences of a substring in a character string
TRANWRD(source, target, replacement)
source=where to search
target=what to search for
replacement=what to replace with
function that concatenates strings together, does not remove leading or trailing blanks
CAT(string1, ... stringn)
function that concatenates strings together, removes leading or trailing blanks from each string
CATS(string1, ... stringn)
function that concatenates strings together, removes leading or trailing blanks from each string, and inserts the delimiter between each string
CATX('delimiter', string1, ... stringn)
function that converts a character value to a numeric value
INPUT(source, informat)
function that converts a numeric or character value to a character value
PUT(source, format)
how to change DATESTYLE?
OPTIONS DATESTYLE=MDY;
or
OPTIONS DATESTYLE=DMY;
how to change the type of an existing column?
use the RENAME function
SET input-table (RENAME=(current-column=new-name-for-existing-col));
…
current-column=input(new-name-for-existing-col, …)
…
drop new-name-for-existing-col;
syntax for creating custom formats
PROC FORMAT;
VALUE format-name value-or-range-1 = 'formatted-value'
value-or-range-2 = 'formatted-value'
. . . ;
RUN;
what keywords can be used in the VALUE statement of PROC FORMAT?
low, high, other
when do you need to include a period in the format name of a custom format?
not when you create it, but when you use it in a FORMAT statement or in the PUT function
syntax for creating custom formats from tables
PROC FORMAT CNTLIN=input-table FMTLIB;
SELECT format-names;
RUN;
what are the required columns for an input table when creating a custom format?
Start, Label, and FmtName (End is optional)
what does FMTLIB do?
creates a report containing information about your custom formats
how to store a custom format permanently?
LIBRARY= … option in PROC FORMAT
syntax for concatenating tables
DATA output-table;
SET input-table1(rename=(current-colname=new-colname))
input-table2 ...;
RUN;
how to merge tables using DATA step?
if data needs to be sorted first:
PROC SORT DATA=input-table OUT=output-table;
BY BY-column;
RUN;
then:
DATA output-table;
MERGE input-table1 input-table2 ...;
BY BY-column(s);
RUN;
synax for merging tables and marking where values came from
DATA output-table;
MERGE input-table1(IN=var1) input-table2(IN=var2) ...;
BY BY-column(s);
RUN;
syntax for iterative DO loops
DATA output-table;
. . .
DO index-column = start TO stop <BY increment>;
. . . repetitive code . . .
END;
. . .
RUN;
syntax for conditional DO loops
DATA output-table;
SET input-table;
. . .
DO UNTIL | WHILE (expression);
. . . repetitive code . . .
<OUTPUT;>
END;
RUN;
what’s the difference between DO UNTIL and DO WHILE?
DO UNTIL checks the condition at the bottom of the DO loop, but DO WHILE checks the condition at the top. This is very important when combining a conditional loop with an iterative loop.