7 - Data Cleaning and Preparation

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/19

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

20 Terms

1
New cards

dropna

Filters out missing data from a Series or DataFrame, removing rows or columns with NA values based on specified thresholds.

2
New cards

fillna

Fills missing values with a specified value or using interpolation methods like 'ffill' (forward fill) or 'bfill' (backward fill).

3
New cards

isnull

Returns a boolean array indicating which values are missing/NA in a Series or DataFrame.

4
New cards

notnull

Returns the negation of `isnull`, indicating which values are not missing/NA.

5
New cards

drop_duplicates

Use the `drop_duplicates` method, which returns a DataFrame with duplicate rows removed.

6
New cards

duplicated

A boolean Series indicating whether each row is a duplicate of a previous row.

7
New cards

replace

Use the `replace` method, which substitutes occurrences of one value or pattern with another.

8
New cards

rename

Renames axis labels (index or columns) in a DataFrame, either in-place or returning a new DataFrame.

9
New cards

pd.cut

Bins continuous data into intervals based on specified bin edges or quantiles.

10
New cards

pd.qcut

Bins data into equal-sized buckets based on sample quantiles.

11
New cards

detect outliers

Use boolean indexing with conditions (e.g., `np.abs(data) > 3`) or statistical methods like standard deviation.

12
New cards

get_dummies

Converts categorical variables into dummy/indicator variables (one-hot encoding).

13
New cards

split

Use the `split` method, often combined with `strip` to trim whitespace.

14
New cards

str.contains

Checks if each string in a Series contains a specified pattern or substring, returning a boolean Series.

15
New cards

str.extract

Use the `str.extract` or `str.findall` methods with a regex pattern containing groups.

16
New cards

str.replace

Replaces occurrences of a pattern or substring in each string of a Series.

17
New cards

str.cat

Use the `str.cat` method with an optional delimiter.

18
New cards

str.upper

Converts all characters in each string of a Series to uppercase.

19
New cards

str.startswith

Use the `str.startswith` method.

20
New cards

str.len

The length of each string in a Series.