1/11
A set of flashcards designed to help students prepare for their Data Science exam, covering key concepts and definitions.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What is Data Science?
A multidisciplinary field that uses various techniques, algorithms, processes, and systems to extract insights and knowledge from structured and unstructured data.
What are the main types of data?
Quantitative data (measurable quantities) and Qualitative data (characteristics and descriptors that cannot be easily measured).
What is the purpose of data preprocessing?
To clean, transform, and integrate data to make it suitable for analysis, ensuring high data quality.
Define supervised learning.
A type of machine learning where algorithms learn from labeled input data to make predictions or decisions.
What is the difference between accuracy and precision in model evaluation?
Accuracy is the ratio of correctly predicted observations to total observations, while precision is the ratio of correctly predicted positive observations to all predicted positives.
What is the purpose of a confusion matrix?
To visualize the performance of a classification algorithm by showing true positives, true negatives, false positives, and false negatives.
Explain exploratory data analysis (EDA).
A process to analyze datasets to summarize their main characteristics, often using visual methods.
What is feature scaling?
A technique used to standardize the range of independent variables or features of data, essential for algorithms sensitive to the scale of data.
What is the significance of the F1 score?
The F1 score is the harmonic mean of precision and recall, providing a balance between the two metrics, particularly in imbalanced datasets.
What is hyperparameter tuning?
The process of choosing a set of optimal hyperparameters for a learning algorithm to improve the model's performance.
What is the role of decision trees in machine learning?
Decision trees are used for classification and regression tasks, providing a model that predicts the value of a target variable based on several decision rules inferred from the data features.
Define principal component analysis (PCA).
A dimensionality reduction technique used to reduce the number of features in a dataset while preserving as much variance as possible.