We often need to work with specific observations in a dataset.
Selecting observations is easier with variables that identify each observation or the observation number itself.
The process involves understanding key characteristics of the data we want to subset.
The primary tool for subsetting data is the IF-THEN
statement.
Basic Format: IF <condition> THEN <action>
.
Condition can be a single or multiple conditions involving logical operators: AND
, OR
.
If the condition is TRUE, the action is performed; if FALSE, no action is taken unless paired with an ELSE
statement.
Using Venn diagrams helps visualize overlaps when using multiple AND
or OR
conditions.
Be meticulous about your conditions to avoid missing key observations.
Single Condition Example:
IF income < 10000 THEN dummy = 1;
(Flagging income below 10,000)
Handling Missing Values:
IF income = . THEN dummy = .;
(Flag missing income records)
The observation number is an internal variable in SAS referred to as OBS
.
Each dataset row corresponds to an observation number, which is not visible in the dataset but can be used in conditions.
N_
is utilized as a syntax in conditions: IF _N_ <= 10 THEN include = 0;
(Flagging the first ten observations for inclusion).
Commonly used operators:
=
(equal)
<>
(not equal)
>
(greater than)
<
(less than)
>=
(greater than or equal to)
<=
(less than or equal to)
You can use the IN operator to check membership within a set:
IF sex IN ('M', 'F') THEN <action>;
(Checking for male or female).
You can create new datasets by keeping certain observations based on conditions.
Example of keeping males only:
DATA males;
SET original_dataset;
IF sex = 'M';
Deleting observations can also be done with conditions:
IF income = . THEN DELETE;
You can sort datasets by multiple variables:
PROC SORT DATA=salary; BY rank salary;
(Sorting by rank then by salary).
Use the DESCENDING
option to sort in reverse order for numerical values.
When dealing with logical conditions, ensure clarity:
Utilizing parentheses can help structure your conditions correctly.
Always close brackets that you open to maintain structure in your code.
Multiple methods exist for importing data into SAS.
The INFILE statement specifies where to read the data from, while INPUT defines how to parse the data.
The options like MISSOVER
or TRUNCOVER
control the behavior of how SAS treats lines that are shorter than expected.
Labels provide descriptive references for variables and can be added using:
LABEL variable_name = 'Human-readable name';
Labels appear in outputs and can clarify the meaning of data in reports.
Conditional Statements: Use to manipulate observations based on criteria.
Data Subset: Create new datasets by filtering existing ones.
Logical Clearances: Always double-check logical statements to ensure accuracy.
Data Input Techniques: Understanding how to correctly input data is crucial to avoid errors.
Labels: Enhances understanding of the dataset published results without altering the base data.
Next sessions will continue into more advanced topics including handling date and time formats in SAS programming.
Stay tuned for further lectures!
Use the DATA
step to create new variables based on existing ones:
DATA new_dataset;
SET original_dataset;
new_variable = existing_variable * 2; /* Example: doubling a variable */
For iterative processes, DO
loops can be utilized:
DO i = 1 TO 10;
/* actions to perform 10 times */
END;
Merging datasets is achieved with:
DATA merged_dataset;
MERGE dataset1 dataset2;
BY common_variable;
To format output of variables, use:
FORMAT variable_name format_type.;
Example:
FORMAT income DOLLAR8.; /* Format income as currency */
To export datasets to a CSV file:
PROC EXPORT DATA=dataset_name
OUTFILE="path\to\file.csv"
DBMS=CSV REPLACE;
RUN;
You can create new variables from existing data using the DATA
step:
DATA new_dataset;
SET original_dataset;
new_variable = existing_variable * 2; /* Example: doubling a variable */
To perform iterative actions, utilize DO
loops:
DO i = 1 TO 10;
/* actions to perform 10 times */
END;
You can combine datasets using:
DATA merged_dataset;
MERGE dataset1 dataset2;
BY common_variable;
To specify how variables should appear in output, use:
FORMAT variable_name format_type.;
Example:
FORMAT income DOLLAR8.; /* Format income as currency */
To send datasets to a CSV file, use:
PROC EXPORT DATA=dataset_name
OUTFILE="path\to\file.csv"
DBMS=CSV REPLACE;
RUN;