1/22
These flashcards cover key concepts, definitions, and terms relevant to the field of data science as introduced in the lecture.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Data Science
An interdisciplinary field that utilizes statistical theories and computer science to analyze data resources to solve problems and predict outcomes.
Structured Data
Data that is organized in a predefined format, easily searchable in databases.
Unstructured Data
Data that is not organized in a predefined manner, such as text, images, and social media posts.
Velocity
The speed at which data is generated and processed.
Variety
The different forms and sources of data, including structured and unstructured formats.
Data Engineering
The field focused on developing and maintaining systems that collect and store large amounts of data.
Data Analytics
The systematic computational analysis of data aimed at drawing conclusions and making decisions.
Business Intelligence (BI)
A set of tools and methodologies that assist organizations in accessing and analyzing historical and real-time data.
Machine Learning
A subset of artificial intelligence that enables systems to learn and improve from experience without being explicitly programmed.
CRISP-DM Model
A process model for data mining that outlines the phases of a data science project: Business Understanding, Data Understanding, Data Preparation, Modeling, Evaluation, and Deployment.
Predictive Analytics
Analysis that uses data, statistical algorithms, and machine learning techniques to identify the likelihood of future outcomes.
Descriptive Statistics
A branch of statistics that summarizes and describes the characteristics of a dataset.
Inferential Statistics
A branch of statistics that makes inferences and predictions about a population based on a sample of data.
Feature Selection
The process of identifying and selecting a subset of relevant features for model construction.
Artificial Intelligence (AI)
Field aiming to create systems that can perform tasks that would normally require human intelligence.
Natural Language Processing (NLP)
A field of AI that focuses on the interaction between computers and humans through natural language.
Data Mining
The process of discovering patterns and knowledge from large amounts of data.
Validation
The process of evaluating the performance of a model on a new set of data.
Bias in Data Science
Human biases that can affect the outcomes of data science models and lead to inaccurate predictions.
Causation vs. Correlation
Causation indicates that one event is the result of the occurrence of another event, while correlation indicates a relationship or association between two variables.
Training Set
The portion of the dataset used to train the model.
Testing Set
The portion of the dataset used to evaluate the model's performance.
Key Performance Indicators (KPIs)
Quantifiable measures that gauge the performance of an organization in achieving business objectives.