Data Science Midterm Exam Study Guide

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/39

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

40 Terms

1
New cards

What is the main role of a data scientist?

To analyze and interpret complex data to help organizations make informed decisions and improve performance.

2
New cards

What is the purpose of Exploratory Data Analysis (EDA)?

To uncover patterns, detect anomalies, and test assumptions before formal modeling.

3
New cards

Name one tool commonly used for EDA.

Visualization tools like histograms, scatter plots, or summary statistics.

4
New cards

What is structured data?

Data organized in tables with rows and columns (e.g., spreadsheets).

5
New cards

What is unstructured data?

Data without a predefined format (e.g., text, images, social media posts).

6
New cards

Give an example of each: structured and unstructured data.

Structured: sales spreadsheet; Unstructured: customer review text.

7
New cards

What is data ethics?

Guidelines for collecting, analyzing, and using data responsibly.

8
New cards

Give an example of an ethical concern in data science.

Using facial recognition without consent.

9
New cards

Name one benefit of using data science in environmental conservation.

Predicting natural disasters or tracking endangered species.

10
New cards

What is an open-source software license?

A license allowing users to freely use, modify, and share software source code.

11
New cards

What are two features of Jupyter Notebooks?

1) Interactive code execution, 2) Markdown support for combining text and code.

12
New cards

What does the append() method do in Python lists?

Adds a single element to the end of a list.

13
New cards

What does the extend() method do?

Adds multiple elements (from another iterable) to the end of a list.

14
New cards

What is a Python dictionary?

A collection of key-value pairs used for fast lookups.

15
New cards

What is the purpose of the def keyword in Python?

To define a reusable function.

16
New cards

What is pandas used for?

Data manipulation and analysis using DataFrames.

17
New cards

What is seaborn used for?

Statistical data visualization built on top of matplotlib.

18
New cards

What is NumPy used for?

Numerical computing and working with arrays.

19
New cards

What is the formula for the probability of A or B?

P(A or B) = P(A) + P(B) - P(A and B).

20
New cards

When should you start the y-axis at zero?

For bar charts, to avoid misleading visuals.

21
New cards

When should you include a legend?

When using color to represent categories.

22
New cards

What does a box plot show?

Data distribution based on quartiles and identifies outliers.

23
New cards

What does a density plot show?

A smooth curve representing data distribution.

24
New cards

How can you make a plot clearer in matplotlib?

Add gridlines and labels.

25
New cards

What are mutually exclusive events?

Events that cannot happen at the same time.

26
New cards

What percentage of data falls within 1, 2, and 3 standard deviations in a normal distribution?

About 68%, 95%, and 99.7%, respectively.

27
New cards

What is the difference between descriptive and inferential statistics?

Descriptive summarizes data; inferential makes predictions about a population.

28
New cards

What is a p-value?

The probability of observing your results (or more extreme) if the null hypothesis is true.

29
New cards

What does a confidence interval show?

A range of values likely to contain the true population parameter.

30
New cards

What is data preprocessing?

Preparing raw data for analysis by cleaning and transforming it.

31
New cards

How can missing values be handled?

By removing rows/columns with many missing values or imputing them.

32
New cards

How can categorical data be structured for analysis?

By encoding categories into numerical values (e.g., one-hot encoding).

33
New cards

What do .loc and .iloc do in pandas?

.loc selects by labels; .iloc selects by numerical index.

34
New cards

What is Missing Completely At Random (MCAR)?

Missingness is unrelated to any variable in the data.

35
New cards

What is Missing At Random (MAR)?

Missingness depends on observed data but not the missing value itself.

36
New cards

What is Missing Not At Random (MNAR)?

Missingness depends on the missing value itself.

37
New cards

What is self-selection bias?

When participants choose to join a study, leading to non-representative samples.

38
New cards

What is an outlier?

A data point that differs significantly from others.

39
New cards

What is Bayes’ Theorem used for?

Calculating conditional probabilities.

40
New cards

How can you spot bias in data?

By checking sampling methods and looking for systematic errors.