AH

DataFrame Methods and Merging in Pandas

Lab Exam 2 (Lab 14)

Pandas Package Overview

  • The Pandas package is a powerful tool for data manipulation and analysis in Python.

DataFrame Review

  • A DataFrame in Pandas is a two-dimensional labeled data structure with columns of potentially different types.
  • DataFrames can be thought of as a dict of Series objects.

Key DataFrame Methods

drop()

  • Purpose: To remove specified labels from rows or columns.

  • Syntax: drop(index=None, labels=None, axis=0, inplace=False)

    • index: Specify the row indices to drop.
    • labels: Specify column labels to drop.
    • axis: 0 for rows, 1 for columns.
    • inplace: If True, drop is done in place and the method returns None.
  • Example Usage:

  result = country.drop(columns=['Percentage'])
  print(result)
  • Output: Displays the DataFrame without the Percentage column.

drop_duplicates()

  • Purpose: To remove duplicate rows based on specific criteria.

  • Syntax: drop_duplicates(subset=None, inplace=False)

  • Example Usage:

  result = country.drop_duplicates(subset=['Continent'])
  print(result)
  • Output: Displays DataFrame with duplicate continents removed.

dropna()

  • Purpose: To remove missing values from the DataFrame.
  • Usage for Rows:
  result = country.dropna(axis=0)
  print(result)
  • Usage for Columns:
  result = country.dropna(axis=1)
  print(result)

sort_values()

  • Purpose: To sort the DataFrame by a specific column or row.
  • Example Usage:
  result = country.sort_values(by='Area')
  print(result)
  • Sorting in Descending Order:
  result = country.sort_values(by='Area', ascending=False)

Merging DataFrames

  • Purpose: To combine two DataFrames based on a common key or keys.
  • Syntax: DataFrame.merge(leftdataframe, rightdataframe, how='outer', on=None)
    • The how parameter defines the type of join (e.g., INNER JOIN, LEFT JOIN).

Examples of Joins

  • Inner Join:
  result = pd.merge(student, registration, how='inner', left_on='Panther ID', right_on='Student ID')
  • Left Join:
  result = pd.merge(student, registration, how='left', left_on='Panther ID', right_on='Student ID')
  • Right Join:
  result = pd.merge(student, registration, how='right', left_on='Panther ID', right_on='Student ID')
  • Outer Join:
  result = pd.merge(student, registration, how='outer', left_on='Panther ID', right_on='Student ID')

Example DataFrames for Merging

  • student:
  student = pd.DataFrame([[1,'Michael'],[2,'Henry'],[3,'Sarah'],[4,'John']], columns=['Panther ID','Name'])
  • registration:
  registration = pd.DataFrame([[1,'CSC1301', 2],[2,'CSC1302', 1],[3,'CSC2720', 1],[4,'CSC4120', 2], [5,'CSC4330',5]], columns=['Reg ID','Course', 'Student ID'])

Exercises

  • Exercise 1: Merge two DataFrames using an inner join based on member_id and member columns.

    • Example Data:

    • members:

      member_idnameemail
      1Aa@gmail.com
      2Bb@outlook.com
      3Cc@yahoo.com
    • orders:

      order_idproductpricemember
      1000Laptop8002
      1001Phone7001
    • Merge with:

    result = pd.merge(members, orders, how='inner', left_on='member_id', right_on='member')
    

Reference