Python for Data Science and Machine Learning Concepts

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/282

flashcard set

Earn XP

Description and Tags

These flashcards cover essential concepts in Python for data science, machine learning fundamentals, and cybersecurity principles.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

283 Terms

1
New cards

What is broadcasting?

A mathematical operation done for arrays of different dimensions

2
New cards

What dtype returns?

The details of the numpy array

3
New cards

What does shape returns?

The dimensions of the array

4
New cards

What is the difference between np.argmin and np.min

argmin gives the index and np.min the minimum value

5
New cards

what is the difference between iloc and loc?

iloc you have to give the implicit index and loc works with labels or explicit index

6
New cards

What kind of operations can be performed in pandas?

Data can be cleaned, transformed, manipulated and analyzed

7
New cards

Approaches to create a DataFrame in Pandas

  1. From a list of dictionaries

  2. From a series

  3. From a dictionary with multiple series

  4. from a file

8
New cards

What does a masking operation in Pandas do?

substitutes all the values which hold a TRUE condition

9
New cards

Which function is used to encrypt in Pandas?

sin function

10
New cards

How does the reset index works in pandas?

It put the index as the first column and resets the index to the default integer index.

11
New cards

What does margins do on a crosstab function on Pandas?

It adds row and column totals to the crosstab output.

12
New cards

Types of plots in matplotlib

scatter, histogram, pie, bar, boxplot, line chart

13
New cards

What library evaluates machine model performance?

Sklearn metrics

14
New cards

What library is used for data manipulation?

Pandas is used for data manipulation.

15
New cards

JSON files cannot be imported in Pandas T/F

False

16
New cards

A matplotlib plot cannot be 3D T/F

True

17
New cards

What is a DataFrame?

Is a two-dimensional collection of series, where each column represents a different series.

18
New cards

What is a vectorized operation?

An element wise operation on a complete array without using loops

19
New cards

What is slicing in Numpy?

a way to access to a Ndarray

20
New cards

What is a series?

one dimensional labelled array

21
New cards

axis = 1 is for rows or columns

columns

22
New cards

Method used to apply a function along an axis of a DataFrame

apply

23
New cards

Types of joins in Pandas

Left join

Inner join

Right join

Outter join

24
New cards

when you are joining in an specific column you should use join or merge?

merge

25
New cards

when you are joining on an specific index you should use join or merge?

join

26
New cards

function used in Pandas to reshape a data frame from a wide format to a long format

melt

27
New cards

When to use pivot table in pandas?

When you want to summarize data by multiple dimensions

28
New cards

differences between pivot table and cross tab

pivot_table is more general and can handle various aggregations and more complex data structures. crosstab is specifically designed for creating frequency tables. The main difference is that pivot_table can aggregate data in multiple ways while crosstab is limited to cross-tabulations of counts.

29
New cards

Pandas has options to visualize data T/F

True

30
New cards

What is a figure in plotting?

the top level container of a plot that acts as a window or a page on which everything is drawn

31
New cards

What are axes in plotting?

The area in which data is plotted

32
New cards

Matplot lib imitates…

MATLAB plotting interface

33
New cards

Which is the first step to create a plot in matplot when having already the data?

creating a figure

34
New cards

How you avoid overlaps in subplots?

add fig.tight_layout()

35
New cards

how do you add titles and labels to the axis. Give the syntax.

ax.set_title, ax.set_xlabel(), ax.set_ylabel()

36
New cards

how do you add axes to your plot? Give the syntax

plt.add_axes

37
New cards

how do you make a figure? Give the syntax

plt.figure()

38
New cards

Type of plot in matplot used to analyze historic variations and trends in data

Line chart

39
New cards

What compose the 5 number summary graph?

minimum

q1

median

q3

maximum

40
New cards

which type of graph gives you the 5 number summary?

boxplot

41
New cards

syntax to add a boxplot

ax.boxplot(data)

42
New cards

Machine Learning is deterministic T/F

False

43
New cards

Two main types of AI

narrow, general

44
New cards

What is narrow AI

AI specialized to certain features or jobs

45
New cards

Examples of narrow AI

Board games, self driving cars, virtual assistants

46
New cards

What is General AI?

machines that can perform any task like humans, including reasoning, problem-solving, and being creative.

47
New cards

Supervised learning techniques for numerical data

linear regression, multiple regression, decision tree, SVM

48
New cards

Supervised learning techniques for categorical data

KNN, SVM, random forest, logistic regression, decision trees

49
New cards

Unsupervised learning techniques

K means clustering, DBSCAN

50
New cards

SVM is just for classification problems T/F

False. It can be applied also to regression problems

51
New cards

Two main types of supervised learning techniques

regression, classification

52
New cards

Two types of SVM

one vs one

one vs many

53
New cards

What are the support vectors in SVM?

The points that are near to the division line

54
New cards

What is the hyperplane in SVM?

the decision boundary to whether predict if it is class A or class B

55
New cards

what is the margin in SVM?

the distance between the support vectors and the hyperplane

56
New cards

What is the purpose of the confussion matrix?

To give the accuracy of the model comparing the predict vs the real values

57
New cards

Cases where SVM has a good performance

balance data

where there are few classes (2 preferable)

High dimensional data

58
New cards

What is the coefficient of determination?

A metric used for evaluating the accuracy of a model that indicates the proportion of the variance in the dependent variable that is predictable from the independent variables.

59
New cards

How does SVM work?

It works by finding the optimal hyperplane that maximizes the margin between different classes in the training data.

60
New cards

What is the role of a kernel in SVM?

bringing the data into a high dimensional space to make it easier the separation of classes.

61
New cards

Types of kernels

  include linear, polynomial, radial basis function (RBF), and sigmoid

62
New cards

decision trees is only for classification problems T/F

False. It is also for regression problems

63
New cards

What is a decision tree?

  is a flowchart-like structure used for decision making and predictive modeling, where each internal node represents a test on an attribute, each branch represents the outcome of the test, and each leaf node represents a class label.

64
New cards

Components of a decision tree

leaf nodes

Root node

branches

internal nodes

65
New cards

What represents each leaf node in a decision tree?

A class labelthe

66
New cards

What does a branch represent on a decision tree?

the outcome of a test

67
New cards

What are some metrics for deciding when to split a tree?

GINI index, entropy, variance

68
New cards

what does internal nodes in a tree represent?

Attributes

69
New cards

KNN can be used for classification and regression problems T/F

True

70
New cards

How does KNN measures the distance between neighbors?

With euclidean distance, Manhattan or cosine similarity

71
New cards

What does KNN does?

Classifies data points based on labels of their nearest neighbors

72
New cards

What is an ensemble method?

Is a model that combines machine learning models to improve its performance.

73
New cards

What is Random Forest?

is an Ensemble method that uses multiple decision trees to make predictions and improve accuracy

74
New cards

What is BIAS in ML models?

Bias are erroneous assumptions of how the world is. Is the error introduced by approximating a real-world problem, which may be complex, by a much simpler model

75
New cards

What is Variance in ML models?

Variance is the error introduced by the model's sensitivity to small fluctuations in the training set. ERRONEUS DATA IN TRAINING SET

76
New cards

Bias makes model to overfit T/F

False. It makes it to underfit

77
New cards

Variance makes the model to overfit T/F

True

78
New cards

Two types of ensemble methods

bagging

Boosting

79
New cards

What technique is bootstrap + aggregation?

Bagging

80
New cards

How do bagging works?

multiple random sets of training data and only one for validation which will simulate the test data. Then, you aggregate the predictions of the train sets f data to make a final output

81
New cards

What does boosting does?

It combines multiple weak learners to create a strong learner by focusing on the errors of prior models.

82
New cards

What is the difference between bagging and boosting?

bagging has independent learners and boosting sequential learners

83
New cards

Random Forest is an example of boosting T/F

False. IS an example of bagging

84
New cards

What is dark data?

Data that has not been analyzed yet for improving services and products

85
New cards

How much % of the data is dark as per IBM?

80%`

86
New cards

Is an online content from which consumers can buy directly, without being redirected to another site.

Contextual commerce

87
New cards

Tone analyzer and personality insights are key cognitive services provided by

IBM Watson

88
New cards

What is reinforcement learning?

A type of machine learning where an agent learns to make decisions by taking actions in an environment to maximize cumulative reward.

89
New cards

What is an agent in reinforcement learning?

An entity that interacts with the environment, makes decisions, and learns from the outcomes to improve its performance.

90
New cards

What is the environment in reinforcement learning?

The environment provides feedback to the agent's actions in the form of rewards or penalties, guiding the learning process.

91
New cards

What is reward of reinforcement learning?

signal received by the agent from the environment indicating the immediate benefit of an action taken.

92
New cards

What is policy in reinforcement learning?

A strategy used by the agent to determine the next action based on the current state of the environment.

93
New cards

Cloud machine learning from Amazon

Sage maker

94
New cards

  Distributed/parallel computing environment for handling massive amounts of data is achieved using ______. (2 tools)

hadoop and spark

95
New cards

What is machine Learning?

a.      The ability to learn without explicitly programmed. It learns patterns from input datasets and applies the learning to automatically make predictions for new data.

96
New cards

evaluation metrics for regression problems

R2

r2a

RMSE

97
New cards

evaluation metrics for classification problems

precission

recall

F1 score

98
New cards

What is another name for coefficient of determination?

r2

99
New cards

What is the formula for R2?

Sum of Regression square (SSR)/ Sum of Square Total (SST)

100
New cards

What does Sum of Square Regression represents?

the squared difference of predictions - mean target values