Note

0.0(0)

Take a practice test

Chat with Kai

Explore Top Notes

Cognitive psychology

Studied by 23 people

Chapter 11: The Cotton Revolution

Studied by 119 people

Studied by 11 people

APUSH UNIT 3 VOCAB

Studied by 108 people

Unit 7: Natural Selection

Studied by 27999 people

Come cambiare volo Delta? Cambia la tua rotta in pochi minuti

Studied by 3 people

Pandas DataFrames and Series Explained

Pandas DataFrames

Introduction to Pandas DataFrames

Pandas DataFrames are two-dimensional data structures in the pandas package for Python.
They allow you to work with heterogeneous data types.
Built upon Pandas Series.

Pandas Series

One-dimensional data objects in pandas.
Used to build DataFrames.

Importing Libraries

Import NumPy: import numpy as np
Import pandas: import pandas as pd

Creating a Series

Similar to NumPy arrays but with the ability to define custom index labels.
Example:
my_series = pd.Series(data=[1, 2, 3, 4], index=['a', 'b', 'c', 'd'])

Creating Series using Dictionaries

Dictionaries can be used to create series where keys become index values and values become data values.
```
my_dict = {'a': 2, 'b': 4, 'c': 6, 'd': 8}
my_series = pd.Series(my_dict)
```

Accessing Values

Values can be accessed using associated labels or numerical indices.

my_series['a']  # Access by label
my_series[0]    # Access by numerical index

Slicing Series

Slicing returns both values and labels.
```
my_series['a':'c']
```

Mathematical Operations

Operations are performed element-wise based on labels.

my_series + my_series # Adds elements with matching labels

Mismatched labels result in NaN (Not a Number) values.

NumPy Functions

Many NumPy functions work on series.
```
np.mean(my_series)
```

Pandas DataFrames

Two-dimensional tables with labeled columns that can hold different types of data.
Python implementation of tables like those in Excel or SQL databases.
The standard data structure for tabular data in Python

Creating DataFrames

Created using dictionaries, two-dimensional NumPy arrays, and series using pd.DataFrame().

When using the dictionaries, keys become column names, and values populate the columns.

data = {
    'name': ['Joe', 'Bob', 'Franz'],
    'age': np.array([20, 21, 19]),
    'weight': (150, 160, 145),
    'height': pd.Series([5.8, 5.9, 6.0], index=['Joe', 'Bob', 'Franz']),
    'siblings': 1,
    'gender': 'm'
}
df = pd.DataFrame(data)

Column Creation

Different sequence data structures (lists, NumPy arrays, series, tuples) can be used to populate columns.
Single values will fill the entire column.

Row Index

If a pandas series with an index is used, that index will be used as the row index for the DataFrame.
Otherwise, numerical indices are used by default.

Custom Row Labels

Custom row labels can be provided during DataFrame construction using the index parameter
```
df = pd.DataFrame(data, index=['Joe', 'Bob', 'Franz'])
```

Accessing and Modifying DataFrames

DataFrames behave like dictionaries of Pandas Series objects.

Accessing Columns

Columns can be accessed using dictionary-like indexing or dot notation.
```
df['weight']
df.weight
```

Deleting Columns

Columns can be deleted using the del keyword.
```
del df['weight']
```

Adding Columns

New columns can be added like adding new objects to a dictionary.
```
df['IQ'] = [120, 130, 140]
```
If a series is inserted, it will be matched based on indices; unmatched indices will be NaN.
If performing column additions, it is important to match the same length as other data objects in the DataFrame. If it doesn't have a matching length, the full column will be filled with the entry.

Indexing with .loc and .iloc

.loc is used for label-based indexing.

df.loc['Joe']          # Get row with label 'Joe'
df.loc['Joe', 'IQ']   # Get value at row 'Joe', column 'IQ'
df.loc['Joe':'Bob', 'IQ':'college'] #Slicing columns and rows

.iloc is used for integer-based indexing.

df.iloc[0]          # Get row at index 0

logical indexing

Rows can be selected using a sequence of boolean values (logical index).
```
bool_index = [False, True, True]
df[bool_index]
```
Logical indexing is useful for subsetting data based on comparisons.
```
df[df['age'] > 12]
```

Exploring DataFrames

Useful when loading data from external sources.

Loading Data

Example using the Titanic dataset:

titanic_train = pd.read_csv('titanic_train.csv')
type(titanic_train)  # pandas.DataFrame

Loading data will be covered in the next lesson.

DataFrame Size

.shape attribute shows the dimensions of the DataFrame.
```
titanic_train.shape  # (891 rows, 12 columns)
```

Viewing Rows

.head(n) shows the first n rows.
.tail(n) shows the last n rows.

Index Column

Check Dataframe index: df.index

Converting column to index and removing the column that was converted.

titanic_train.index = titanic_train['Name']
del titanic_train['Name']

Getting Column Names

.columns attribute shows the column names.
```
titanic_train.columns
```

Summary Statistics

.describe() function shows summary statistics for numeric columns.
```
titanic_train.describe()
```

NumPy Functions on DataFrames

NumPy functions can operate on DataFrame columns using axis=0.
```
np.mean(titanic_train, axis=0)
```

DataFrame Information

.info() function shows a summary of the DataFrame structure.
```
titanic_train.info()
```

Note

0.0(0)

Take a practice test

Chat with Kai

Explore Top Notes

Cognitive psychology

Studied by 23 people

Chapter 11: The Cotton Revolution

Studied by 119 people

Studied by 11 people

APUSH UNIT 3 VOCAB

Studied by 108 people

Unit 7: Natural Selection

Studied by 27999 people

Come cambiare volo Delta? Cambia la tua rotta in pochi minuti

Studied by 3 people