SAS Data Manipulation with Conditional Statements and Subsetting
Operating on Selected Observations
We often need to work with specific observations in a dataset.
Selecting observations is easier with variables that identify each observation or the observation number itself.
The process involves understanding key characteristics of the data we want to subset.
Conditional Statements in SAS
The primary tool for subsetting data is the
IF-THENstatement.Basic Format:
IF <condition> THEN <action>.Condition can be a single or multiple conditions involving logical operators:
AND,OR.If the condition is TRUE, the action is performed; if FALSE, no action is taken unless paired with an
ELSEstatement.
Venn Diagrams for Multiple Conditions
Using Venn diagrams helps visualize overlaps when using multiple
ANDorORconditions.Be meticulous about your conditions to avoid missing key observations.
Examples of Using IF-THEN Statements
Single Condition Example:
IF income < 10000 THEN dummy = 1;(Flagging income below 10,000)
Handling Missing Values:
IF income = . THEN dummy = .;(Flag missing income records)
Observation Number in SAS
The observation number is an internal variable in SAS referred to as
OBS.Each dataset row corresponds to an observation number, which is not visible in the dataset but can be used in conditions.
N_is utilized as a syntax in conditions:IF _N_ <= 10 THEN include = 0;(Flagging the first ten observations for inclusion).
Comparison Operators in SAS
Commonly used operators:
=(equal)<>(not equal)>(greater than)<(less than)>=(greater than or equal to)<=(less than or equal to)
Using IN Conditions
You can use the IN operator to check membership within a set:
IF sex IN ('M', 'F') THEN <action>;(Checking for male or female).
Subsetting Data
You can create new datasets by keeping certain observations based on conditions.
Example of keeping males only:
DATA males;
SET original_dataset;
IF sex = 'M';
Deleting observations can also be done with conditions:
IF income = . THEN DELETE;
Printing Data with PROC SORT
You can sort datasets by multiple variables:
PROC SORT DATA=salary; BY rank salary;(Sorting by rank then by salary).Use the
DESCENDINGoption to sort in reverse order for numerical values.
Importance of Logical Clarity
When dealing with logical conditions, ensure clarity:
Utilizing parentheses can help structure your conditions correctly.
Always close brackets that you open to maintain structure in your code.
SAS Data Input Techniques
Multiple methods exist for importing data into SAS.
The INFILE statement specifies where to read the data from, while INPUT defines how to parse the data.
The options like
MISSOVERorTRUNCOVERcontrol the behavior of how SAS treats lines that are shorter than expected.
Labeling Variables in SAS
Labels provide descriptive references for variables and can be added using:
LABEL variable_name = 'Human-readable name';
Labels appear in outputs and can clarify the meaning of data in reports.
Summary of Concepts
Conditional Statements: Use to manipulate observations based on criteria.
Data Subset: Create new datasets by filtering existing ones.
Logical Clearances: Always double-check logical statements to ensure accuracy.
Data Input Techniques: Understanding how to correctly input data is crucial to avoid errors.
Labels: Enhances understanding of the dataset published results without altering the base data.
Looking Forward
Next sessions will continue into more advanced topics including handling date and time formats in SAS programming.
Stay tuned for further lectures!
Additional Important Syntax in SAS
Creating New Variables
Use the
DATAstep to create new variables based on existing ones:
DATA new_dataset;
SET original_dataset;
new_variable = existing_variable * 2; /* Example: doubling a variable */
Using DO Loops
For iterative processes,
DOloops can be utilized:
DO i = 1 TO 10;
/* actions to perform 10 times */
END;
Merging Datasets
Merging datasets is achieved with:
DATA merged_dataset;
MERGE dataset1 dataset2;
BY common_variable;
Formatting Variables
To format output of variables, use:
FORMAT variable_name format_type.;
Example:
FORMAT income DOLLAR8.; /* Format income as currency */
Exporting Data to CSV
To export datasets to a CSV file:
PROC EXPORT DATA=dataset_name
OUTFILE="path\to\file.csv"
DBMS=CSV REPLACE;
RUN;
Creating New Variables
You can create new variables from existing data using the
DATAstep:
DATA new_dataset;
SET original_dataset;
new_variable = existing_variable * 2; /* Example: doubling a variable */
Using DO Loops
To perform iterative actions, utilize
DOloops:
DO i = 1 TO 10;
/* actions to perform 10 times */
END;
Merging Datasets
You can combine datasets using:
DATA merged_dataset;
MERGE dataset1 dataset2;
BY common_variable;
Formatting Variables
To specify how variables should appear in output, use:
FORMAT variable_name format_type.;
Example:
FORMAT income DOLLAR8.; /* Format income as currency */
Exporting Data to CSV
To send datasets to a CSV file, use:
PROC EXPORT DATA=dataset_name
OUTFILE="path\to\file.csv"
DBMS=CSV REPLACE;
RUN;