SAS Data Manipulation with Conditional Statements and Subsetting

Operating on Selected Observations

  • We often need to work with specific observations in a dataset.

  • Selecting observations is easier with variables that identify each observation or the observation number itself.

  • The process involves understanding key characteristics of the data we want to subset.

Conditional Statements in SAS

  • The primary tool for subsetting data is the IF-THEN statement.

  • Basic Format: IF <condition> THEN <action>.

    • Condition can be a single or multiple conditions involving logical operators: AND, OR.

    • If the condition is TRUE, the action is performed; if FALSE, no action is taken unless paired with an ELSE statement.

Venn Diagrams for Multiple Conditions
  • Using Venn diagrams helps visualize overlaps when using multiple AND or OR conditions.

  • Be meticulous about your conditions to avoid missing key observations.

Examples of Using IF-THEN Statements

  • Single Condition Example:

    • IF income < 10000 THEN dummy = 1; (Flagging income below 10,000)

  • Handling Missing Values:

    • IF income = . THEN dummy = .; (Flag missing income records)

Observation Number in SAS

  • The observation number is an internal variable in SAS referred to as OBS.

  • Each dataset row corresponds to an observation number, which is not visible in the dataset but can be used in conditions.

  • N_ is utilized as a syntax in conditions: IF _N_ <= 10 THEN include = 0; (Flagging the first ten observations for inclusion).

Comparison Operators in SAS

  • Commonly used operators:

    • = (equal)

    • <> (not equal)

    • > (greater than)

    • < (less than)

    • >= (greater than or equal to)

    • <= (less than or equal to)

Using IN Conditions

  • You can use the IN operator to check membership within a set:

    • IF sex IN ('M', 'F') THEN <action>; (Checking for male or female).

Subsetting Data

  • You can create new datasets by keeping certain observations based on conditions.

  • Example of keeping males only:

  DATA males;
  SET original_dataset;
  IF sex = 'M';
  • Deleting observations can also be done with conditions:

  IF income = . THEN DELETE;  

Printing Data with PROC SORT

  • You can sort datasets by multiple variables:

    • PROC SORT DATA=salary; BY rank salary; (Sorting by rank then by salary).

    • Use the DESCENDING option to sort in reverse order for numerical values.

Importance of Logical Clarity

  • When dealing with logical conditions, ensure clarity:

    • Utilizing parentheses can help structure your conditions correctly.

    • Always close brackets that you open to maintain structure in your code.

SAS Data Input Techniques

  • Multiple methods exist for importing data into SAS.

  • The INFILE statement specifies where to read the data from, while INPUT defines how to parse the data.

  • The options like MISSOVER or TRUNCOVER control the behavior of how SAS treats lines that are shorter than expected.

Labeling Variables in SAS

  • Labels provide descriptive references for variables and can be added using:

  LABEL variable_name = 'Human-readable name';
  • Labels appear in outputs and can clarify the meaning of data in reports.

Summary of Concepts

  • Conditional Statements: Use to manipulate observations based on criteria.

  • Data Subset: Create new datasets by filtering existing ones.

  • Logical Clearances: Always double-check logical statements to ensure accuracy.

  • Data Input Techniques: Understanding how to correctly input data is crucial to avoid errors.

  • Labels: Enhances understanding of the dataset published results without altering the base data.

Looking Forward

  • Next sessions will continue into more advanced topics including handling date and time formats in SAS programming.

Stay tuned for further lectures!

Additional Important Syntax in SAS
Creating New Variables
  • Use the DATA step to create new variables based on existing ones:

DATA new_dataset;
  SET original_dataset;
  new_variable = existing_variable * 2;  /* Example: doubling a variable */
Using DO Loops
  • For iterative processes, DO loops can be utilized:

DO i = 1 TO 10;
  /* actions to perform 10 times */
END;
Merging Datasets
  • Merging datasets is achieved with:

DATA merged_dataset;
  MERGE dataset1 dataset2;
  BY common_variable;
Formatting Variables
  • To format output of variables, use:

FORMAT variable_name format_type.;

Example:

FORMAT income DOLLAR8.;  /* Format income as currency */
Exporting Data to CSV
  • To export datasets to a CSV file:

PROC EXPORT DATA=dataset_name
  OUTFILE="path\to\file.csv"
  DBMS=CSV REPLACE;
RUN;

Creating New Variables
  • You can create new variables from existing data using the DATA step:

DATA new_dataset;
  SET original_dataset;
  new_variable = existing_variable * 2;  /* Example: doubling a variable */
Using DO Loops
  • To perform iterative actions, utilize DO loops:

DO i = 1 TO 10;
  /* actions to perform 10 times */
END;
Merging Datasets
  • You can combine datasets using:

DATA merged_dataset;
  MERGE dataset1 dataset2;
  BY common_variable;
Formatting Variables
  • To specify how variables should appear in output, use:

FORMAT variable_name format_type.;

Example:

FORMAT income DOLLAR8.;  /* Format income as currency */
Exporting Data to CSV
  • To send datasets to a CSV file, use:

PROC EXPORT DATA=dataset_name
  OUTFILE="path\to\file.csv"
  DBMS=CSV REPLACE;
RUN;