1/24
Comprehensive vocabulary flashcards covering the core Python libraries and exploratory data analysis (EDA) methods discussed in the lecture.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
NumPy
The fundamental package for scientific computing with Python, offering a powerful N-dimensional array object.
Vectorization
A feature of NumPy that allows mathematical operations on entire arrays without the need for explicit loops.
Broadcasting
A NumPy capability that enables arithmetic operations between arrays of different shapes.
Linear Algebra (NumPy)
Built-in functions within NumPy used for matrix multiplication, decomposition, and eigen values.
Random Generation
Tools within NumPy for creating random numbers from various statistical distributions.
Pandas
The primary tool for data manipulation and analysis, built on top of NumPy, known as the Data Wrangling Backbone.
DataFrame
A 2-dimensional labeled data structure with columns of potentially different data types.
Series
A 1-dimensional labeled array capable of holding any data type.
Data Cleaning (Pandas)
The process of efficiently handling missing data (NaN), duplicates, and type conversion.
Reshaping (Pandas)
Supporting the pivoting, melting, stacking, and unstacking of data for analysis.
Time Series (Pandas)
Specialized functionality for data range generation and time series analysis.
Scikit-learn
A robust library for machine learning and statistical modeling, covering supervised and unsupervised learning.
Consistent API (Scikit-learn)
A design where all models share the same interface: .fit(), .predict(), and .transform().
Model Selection (Scikit-learn)
Comprehensive tools for evaluating and tuning models, including Cross-validation, Grid Search for Hyperparameters, and Metric Scoring (F1, Accuracy).
Matplotlib
The foundational plotting library for Python used to create static, animated, and interactive visualizations.
Pyplot
A MATLB-like interface within Matplotlib used for quick plotting.
Object-Oriented API (Matplotlib)
An interface providing fine-grained control over every element of a figure, such as axes, labels, and legends.
Seaborn
A high-level statistical visualization library based on Matplotlib designed to make attractive graphics with less code.
Statistical Aggregation (Seaborn)
A feature that automatically calculates confidence intervals and means for plots.
df.describe()
An EDA method that generates descriptive statistics summarizing central tendency, dispersion, and shape, including Count, Mean, Std Dev, Min/Max, and Quartiles.
df.info()
Prints a concise summary of the dataframe, including datatypes, memory usage, and non-null counts, used for initial data audits.
df.head()
Displays the first n rows of a dataframe (default is 5) to provide an immediate snapshot of structure and content.
df.tail()
Displays the last n rows of a Data Frame; essential for chronological checks in time series data.
df.shape()
Returns a tuple representing the dimensionality of the DataFrame (row and column count).
df.hist()
Creates a histogram to visualize the distribution of a single numerical variable by binning data into intervals to reveal skewness or outliers.