1/39
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
What is the main role of a data scientist?
To analyze and interpret complex data to help organizations make informed decisions and improve performance.
What is the purpose of Exploratory Data Analysis (EDA)?
To uncover patterns, detect anomalies, and test assumptions before formal modeling.
Name one tool commonly used for EDA.
Visualization tools like histograms, scatter plots, or summary statistics.
What is structured data?
Data organized in tables with rows and columns (e.g., spreadsheets).
What is unstructured data?
Data without a predefined format (e.g., text, images, social media posts).
Give an example of each: structured and unstructured data.
Structured: sales spreadsheet; Unstructured: customer review text.
What is data ethics?
Guidelines for collecting, analyzing, and using data responsibly.
Give an example of an ethical concern in data science.
Using facial recognition without consent.
Name one benefit of using data science in environmental conservation.
Predicting natural disasters or tracking endangered species.
What is an open-source software license?
A license allowing users to freely use, modify, and share software source code.
What are two features of Jupyter Notebooks?
1) Interactive code execution, 2) Markdown support for combining text and code.
What does the append() method do in Python lists?
Adds a single element to the end of a list.
What does the extend() method do?
Adds multiple elements (from another iterable) to the end of a list.
What is a Python dictionary?
A collection of key-value pairs used for fast lookups.
What is the purpose of the def keyword in Python?
To define a reusable function.
What is pandas used for?
Data manipulation and analysis using DataFrames.
What is seaborn used for?
Statistical data visualization built on top of matplotlib.
What is NumPy used for?
Numerical computing and working with arrays.
What is the formula for the probability of A or B?
P(A or B) = P(A) + P(B) - P(A and B).
When should you start the y-axis at zero?
For bar charts, to avoid misleading visuals.
When should you include a legend?
When using color to represent categories.
What does a box plot show?
Data distribution based on quartiles and identifies outliers.
What does a density plot show?
A smooth curve representing data distribution.
How can you make a plot clearer in matplotlib?
Add gridlines and labels.
What are mutually exclusive events?
Events that cannot happen at the same time.
What percentage of data falls within 1, 2, and 3 standard deviations in a normal distribution?
About 68%, 95%, and 99.7%, respectively.
What is the difference between descriptive and inferential statistics?
Descriptive summarizes data; inferential makes predictions about a population.
What is a p-value?
The probability of observing your results (or more extreme) if the null hypothesis is true.
What does a confidence interval show?
A range of values likely to contain the true population parameter.
What is data preprocessing?
Preparing raw data for analysis by cleaning and transforming it.
How can missing values be handled?
By removing rows/columns with many missing values or imputing them.
How can categorical data be structured for analysis?
By encoding categories into numerical values (e.g., one-hot encoding).
What do .loc and .iloc do in pandas?
.loc selects by labels; .iloc selects by numerical index.
What is Missing Completely At Random (MCAR)?
Missingness is unrelated to any variable in the data.
What is Missing At Random (MAR)?
Missingness depends on observed data but not the missing value itself.
What is Missing Not At Random (MNAR)?
Missingness depends on the missing value itself.
What is self-selection bias?
When participants choose to join a study, leading to non-representative samples.
What is an outlier?
A data point that differs significantly from others.
What is Bayes’ Theorem used for?
Calculating conditional probabilities.
How can you spot bias in data?
By checking sampling methods and looking for systematic errors.