Python for Data Analysis Practical Flashcards

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/24

Earn XP

Description and Tags

Comprehensive vocabulary flashcards covering the core Python libraries and exploratory data analysis (EDA) methods discussed in the lecture.

Last updated 3:31 PM on 6/11/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

25 Terms

New cards

NumPy

The fundamental package for scientific computing with Python, offering a powerful N-dimensional array object.

New cards

Vectorization

A feature of NumPy that allows mathematical operations on entire arrays without the need for explicit loops.

New cards

Broadcasting

A NumPy capability that enables arithmetic operations between arrays of different shapes.

New cards

Linear Algebra (NumPy)

Built-in functions within NumPy used for matrix multiplication, decomposition, and eigen values.

New cards

Random Generation

Tools within NumPy for creating random numbers from various statistical distributions.

New cards

Pandas

The primary tool for data manipulation and analysis, built on top of NumPy, known as the Data Wrangling Backbone.

New cards

DataFrame

A $2$ -dimensional labeled data structure with columns of potentially different data types.

New cards

Series

A $1$ -dimensional labeled array capable of holding any data type.

New cards

Data Cleaning (Pandas)

The process of efficiently handling missing data ( $NaN$ ), duplicates, and type conversion.

New cards

Reshaping (Pandas)

Supporting the pivoting, melting, stacking, and unstacking of data for analysis.

New cards

Time Series (Pandas)

Specialized functionality for data range generation and time series analysis.

New cards

Scikit-learn

A robust library for machine learning and statistical modeling, covering supervised and unsupervised learning.

New cards

Consistent API (Scikit-learn)

A design where all models share the same interface: $.fit()$ , $.predict()$ , and $.transform()$ .

New cards

Model Selection (Scikit-learn)

Comprehensive tools for evaluating and tuning models, including Cross-validation, Grid Search for Hyperparameters, and Metric Scoring ( $F1$ , Accuracy).

New cards

Matplotlib

The foundational plotting library for Python used to create static, animated, and interactive visualizations.

New cards

Pyplot

A MATLB-like interface within Matplotlib used for quick plotting.

New cards

Object-Oriented API (Matplotlib)

An interface providing fine-grained control over every element of a figure, such as axes, labels, and legends.

New cards

Seaborn

A high-level statistical visualization library based on Matplotlib designed to make attractive graphics with less code.

New cards

Statistical Aggregation (Seaborn)

A feature that automatically calculates confidence intervals and means for plots.

New cards

df.describe()

An EDA method that generates descriptive statistics summarizing central tendency, dispersion, and shape, including Count, Mean, Std Dev, Min/Max, and Quartiles.

New cards

df.info()

Prints a concise summary of the dataframe, including datatypes, memory usage, and non-null counts, used for initial data audits.

New cards

df.head()

Displays the first $n$ rows of a dataframe (default is $5$ ) to provide an immediate snapshot of structure and content.

New cards

df.tail()

Displays the last $n$ rows of a Data Frame; essential for chronological checks in time series data.

New cards

df.shape()

Returns a tuple representing the dimensionality of the DataFrame (row and column count).

New cards

df.hist()

Creates a histogram to visualize the distribution of a single numerical variable by binning data into intervals to reveal skewness or outliers.