1/282
These flashcards cover essential concepts in Python for data science, machine learning fundamentals, and cybersecurity principles.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What is broadcasting?
A mathematical operation done for arrays of different dimensions
What dtype returns?
The details of the numpy array
What does shape returns?
The dimensions of the array
What is the difference between np.argmin and np.min
argmin gives the index and np.min the minimum value
what is the difference between iloc and loc?
iloc you have to give the implicit index and loc works with labels or explicit index
What kind of operations can be performed in pandas?
Data can be cleaned, transformed, manipulated and analyzed
Approaches to create a DataFrame in Pandas
From a list of dictionaries
From a series
From a dictionary with multiple series
from a file
What does a masking operation in Pandas do?
substitutes all the values which hold a TRUE condition
Which function is used to encrypt in Pandas?
sin function
How does the reset index works in pandas?
It put the index as the first column and resets the index to the default integer index.
What does margins do on a crosstab function on Pandas?
It adds row and column totals to the crosstab output.
Types of plots in matplotlib
scatter, histogram, pie, bar, boxplot, line chart
What library evaluates machine model performance?
Sklearn metrics
What library is used for data manipulation?
Pandas is used for data manipulation.
JSON files cannot be imported in Pandas T/F
False
A matplotlib plot cannot be 3D T/F
True
What is a DataFrame?
Is a two-dimensional collection of series, where each column represents a different series.
What is a vectorized operation?
An element wise operation on a complete array without using loops
What is slicing in Numpy?
a way to access to a Ndarray
What is a series?
one dimensional labelled array
axis = 1 is for rows or columns
columns
Method used to apply a function along an axis of a DataFrame
apply
Types of joins in Pandas
Left join
Inner join
Right join
Outter join
when you are joining in an specific column you should use join or merge?
merge
when you are joining on an specific index you should use join or merge?
join
function used in Pandas to reshape a data frame from a wide format to a long format
melt
When to use pivot table in pandas?
When you want to summarize data by multiple dimensions
differences between pivot table and cross tab
pivot_table is more general and can handle various aggregations and more complex data structures. crosstab is specifically designed for creating frequency tables. The main difference is that pivot_table can aggregate data in multiple ways while crosstab is limited to cross-tabulations of counts.
Pandas has options to visualize data T/F
True
What is a figure in plotting?
the top level container of a plot that acts as a window or a page on which everything is drawn
What are axes in plotting?
The area in which data is plotted
Matplot lib imitates…
MATLAB plotting interface
Which is the first step to create a plot in matplot when having already the data?
creating a figure
How you avoid overlaps in subplots?
add fig.tight_layout()
how do you add titles and labels to the axis. Give the syntax.
ax.set_title, ax.set_xlabel(), ax.set_ylabel()
how do you add axes to your plot? Give the syntax
plt.add_axes
how do you make a figure? Give the syntax
plt.figure()
Type of plot in matplot used to analyze historic variations and trends in data
Line chart
What compose the 5 number summary graph?
minimum
q1
median
q3
maximum
which type of graph gives you the 5 number summary?
boxplot
syntax to add a boxplot
ax.boxplot(data)
Machine Learning is deterministic T/F
False
Two main types of AI
narrow, general
What is narrow AI
AI specialized to certain features or jobs
Examples of narrow AI
Board games, self driving cars, virtual assistants
What is General AI?
machines that can perform any task like humans, including reasoning, problem-solving, and being creative.
Supervised learning techniques for numerical data
linear regression, multiple regression, decision tree, SVM
Supervised learning techniques for categorical data
KNN, SVM, random forest, logistic regression, decision trees
Unsupervised learning techniques
K means clustering, DBSCAN
SVM is just for classification problems T/F
False. It can be applied also to regression problems
Two main types of supervised learning techniques
regression, classification
Two types of SVM
one vs one
one vs many
What are the support vectors in SVM?
The points that are near to the division line
What is the hyperplane in SVM?
the decision boundary to whether predict if it is class A or class B
what is the margin in SVM?
the distance between the support vectors and the hyperplane
What is the purpose of the confussion matrix?
To give the accuracy of the model comparing the predict vs the real values
Cases where SVM has a good performance
balance data
where there are few classes (2 preferable)
High dimensional data
What is the coefficient of determination?
A metric used for evaluating the accuracy of a model that indicates the proportion of the variance in the dependent variable that is predictable from the independent variables.
How does SVM work?
It works by finding the optimal hyperplane that maximizes the margin between different classes in the training data.
What is the role of a kernel in SVM?
bringing the data into a high dimensional space to make it easier the separation of classes.
Types of kernels
include linear, polynomial, radial basis function (RBF), and sigmoid
decision trees is only for classification problems T/F
False. It is also for regression problems
What is a decision tree?
is a flowchart-like structure used for decision making and predictive modeling, where each internal node represents a test on an attribute, each branch represents the outcome of the test, and each leaf node represents a class label.
Components of a decision tree
leaf nodes
Root node
branches
internal nodes
What represents each leaf node in a decision tree?
A class labelthe
What does a branch represent on a decision tree?
the outcome of a test
What are some metrics for deciding when to split a tree?
GINI index, entropy, variance
what does internal nodes in a tree represent?
Attributes
KNN can be used for classification and regression problems T/F
True
How does KNN measures the distance between neighbors?
With euclidean distance, Manhattan or cosine similarity
What does KNN does?
Classifies data points based on labels of their nearest neighbors
What is an ensemble method?
Is a model that combines machine learning models to improve its performance.
What is Random Forest?
is an Ensemble method that uses multiple decision trees to make predictions and improve accuracy
What is BIAS in ML models?
Bias are erroneous assumptions of how the world is. Is the error introduced by approximating a real-world problem, which may be complex, by a much simpler model
What is Variance in ML models?
Variance is the error introduced by the model's sensitivity to small fluctuations in the training set. ERRONEUS DATA IN TRAINING SET
Bias makes model to overfit T/F
False. It makes it to underfit
Variance makes the model to overfit T/F
True
Two types of ensemble methods
bagging
Boosting
What technique is bootstrap + aggregation?
Bagging
How do bagging works?
multiple random sets of training data and only one for validation which will simulate the test data. Then, you aggregate the predictions of the train sets f data to make a final output
What does boosting does?
It combines multiple weak learners to create a strong learner by focusing on the errors of prior models.
What is the difference between bagging and boosting?
bagging has independent learners and boosting sequential learners
Random Forest is an example of boosting T/F
False. IS an example of bagging
What is dark data?
Data that has not been analyzed yet for improving services and products
How much % of the data is dark as per IBM?
80%`
Is an online content from which consumers can buy directly, without being redirected to another site.
Contextual commerce
Tone analyzer and personality insights are key cognitive services provided by
IBM Watson
What is reinforcement learning?
A type of machine learning where an agent learns to make decisions by taking actions in an environment to maximize cumulative reward.
What is an agent in reinforcement learning?
An entity that interacts with the environment, makes decisions, and learns from the outcomes to improve its performance.
What is the environment in reinforcement learning?
The environment provides feedback to the agent's actions in the form of rewards or penalties, guiding the learning process.
What is reward of reinforcement learning?
A signal received by the agent from the environment indicating the immediate benefit of an action taken.
What is policy in reinforcement learning?
A strategy used by the agent to determine the next action based on the current state of the environment.
Cloud machine learning from Amazon
Sage maker
Distributed/parallel computing environment for handling massive amounts of data is achieved using ______. (2 tools)
hadoop and spark
What is machine Learning?
a. The ability to learn without explicitly programmed. It learns patterns from input datasets and applies the learning to automatically make predictions for new data.
evaluation metrics for regression problems
R2
r2a
RMSE
evaluation metrics for classification problems
precission
recall
F1 score
What is another name for coefficient of determination?
r2
What is the formula for R2?
Sum of Regression square (SSR)/ Sum of Square Total (SST)
What does Sum of Square Regression represents?
the squared difference of predictions - mean target values