ch 2 CLI (pipe)

Terminal and Commands Introduction

  • Opening Terminal

    • Start the terminal

    • Use command pwd to check current working directory (should be CLI files)

Working with the iris.csv Dataset

  • Navigating to the Dataset

    • Go to lesson zero two

  • Examining the Contents of iris.csv

    • Use command cat iris.csv (0:28)

    • Dataset Overview:

      • Contains classes of three species of flowers

      • Three specified classes:

      • Iris virginica

      • Iris versicolor

      • Iris setosa

      • Not the focus of analysis but serves as a case study

Measuring Data Points

  • Counting Lines

    • Use command wc iris.csv

    • Result:

      • Total of 150 data points

Checking for Header in the CSV file

  • Verifying Header Presence

    • Use command head iris.csv

    • Output:

      • Displays top 10 lines, confirming no header is present

Using Pipe Symbol with Commands

  • Alternative Methods of Counting Lines

    • Combine commands: cat iris.csv | wc

      • Explanation of the Pipe Symbol (|)

      • Takes output from cat and uses it as input for wc

      • Each command operates independently but collaboratively

      • Significance:

        • Flexible constructs without hardcoding dependencies

Understanding grep Command

grep means take a specified string or pattern and return the lines from text files that contain it. This command is instrumental in searching for data within files, allowing users to filter and manipulate large datasets efficiently.

  • Searching Within Files

    • Use command grep "setosa" iris.csv

      • Purpose: Find lines containing setosa

      • Output:

      • Only lines with the word setosa are returned

  • Regular Expressions in grep

    • setosa as a basic regex example

      • Encouraged to learn more about regex, outside of current topic

Chaining Commands Effectively

  • Using cat and grep Together

    • Command: cat iris.csv | grep "setosa"

      • Achieves same outcome as previous grep command

  • Counting Specific Lines

    • Chain to count occurrences:

      • Command: cat iris.csv | grep "setosa" | wc

      • Outcome:

      • Total of 50 setosa data points

      • Explanation:

      • Creates a counting utility after selection

Additional Chaining Examples

  • Searching for Numerical Values

    • Command: cat iris.csv | grep "3.5"

      • Output lines with the numerical value of 3.5

  • Combining Multiple Filters

    • Command: cat iris.csv | grep "setosa" | grep "3.5"

      • Filters for setosa lines that also include 3.5

  • Counting Filtered Lines

    • Command: cat iris.csv | grep "setosa" | grep "3.5" | wc

    • Allows for the counting of the lines that meet both criteria automatically, providing a quick summary of the data.

      • Output: 6 lines matching both filters

Listing Files with Specific Extensions

  • Finding CSV Files

    • Command: ls | grep .csv

      • Outputs all files in the directory that contain the word .csv

Exercises for Practice

  • Experiment with various combinations of commands:

    • Filter lines containing verticular and 2.0

    • Practice counting these filtered results

  • Importance of mastering these commands:

    • Essential for efficiently working with large datasets

    • More efficient than writing custom scripts.