1/61
Flashcards for reviewing key vocabulary from Principles of Data Science.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Data
Pieces of evidence or observations that can be analyzed to provide some insights.
Data Science
A field of study that investigates how to collect, manage, and analyze data of all types in order to retrieve meaningful information.
Data analysis
The process of examining and interpreting raw data to uncover patterns, discover meaningful insights, and make informed decisions.
Data science cycle
A process, including problem definition, data collection, preparation, analysis, and reporting.
Data collection
The systematic process of gathering information on variables of interest.
Data reporting
The presentation of data in a way that will best convey the information learned from data analysis.
Data warehousing
Storing and managing large volumes of data from various sources in a central location for easier access and analysis by businesses.
Feature
Each item is defined by a combination of attributes or characteristics.
Quantitative data
Data that are measured and expressed using numbers.
Qualitative data
Data that are non-numerical data that generally describe subjective attributes or characteristics.
Numeric data
Data represented in numbers that indicate measurable quantities.
Continuous data
Data where the values can be any number.
Discrete data
Data where the values follow a specific precision, which makes the set of possible values finite.
Categorical Data
Data represented in different forms such as words, symbols, and even numbers where a categorical value is chosen from a finite set of values, and the value does not necessarily indicate a measurable quantity.
Nominal Data
Data where the set of possible values does not include any ordering notion.
Ordinal Data
Data where the set of possible values includes an ordering notion.
Dataset
A collection of observations or data entities organized for analysis and interpretation.
Unstructured dataset
A dataset that lacks a predefined or organized data model.
Structured dataset
A dataset organized in a tabular format with clearly defined fields and relationships.
Comma-Separated Values (CSV)
Stores each item in the dataset in a single line, with variable values for each item listed all in one line, separated by commas (,).
JavaScript Object Notation (JSON)
Uses the syntax of a programming language named JavaScript and follows JavaScript’s object syntax.
Extensible Markup Language (XML)
Lists each item of the dataset using different symbols named tags.
Spreadsheet Programs
Consist of electronic worksheets with rows and columns where data can be entered, manipulated, and calculated.
Programming Language
A formal language that consists of a set of instructions or commands used to communicate with a computer and to instruct it to perform specific tasks that may include data manipulation, computation, and input/output operations.
Pandas
A Python library specialized for data manipulation and analysis, and it is very commonly used among data scientists.
Google Colaboratory (Colab)
Google’s free application to edit a web-based environment that allows you to run a Python program more interactively, using programming code, math equations, visualizations, and plain texts.
DataFrame
A data type that Pandas uses to store multi-column tabular data.
Recommendation systems
A system that makes data-driven, personalized suggestions for users.
Jupyter Notebook
A web-based document that helps users run Python programs more interactively.
Categorical data
Data that is represented in different forms and do not indicate measurable quantities.
Sabermetrics
A statistical approach to sports team management.
Sports analytics
Use of data and business analytics in sports.
Adaptive learning
A technique used in education that involves personalized material for each learner based on their past performance.
Internet of Things (IoT)
Describes a network of multiple objects interacting with each other through the Internet.
Precision Medicine Initiative
A research endeavor with the goal of better understanding how a person's genetics, environment, and lifestyle can help determine the best approach to prevent or treat disease.
Qualitative data
A non-numerical data that generally describe subjective attributes or characteristics.
Quantitative data
A measureable data with specific quanities and amounts.
Predictive analytics
Statistical techniques, algorithms, and machine learning that analyze historical data and make predictions about future events.
Data analysis
The process of examining and interpreting raw data to uncover patterns, discover meaningful insights, and make informed decisions
Problem definition
The first step in the data science cycle is a precise definition of the problem statement to establish clear objectives for the goal and scope of the data analysis project.
Data collection
The systematic process of gathering information on variables of interest.
Data preparation
The second step within the data science cycle; converts the collected data into an optimal form for analysis.
Data reporting
Involves the presentation of data in a way that will best convey the information learned from data analysis.
Data visualization
The graphical representation of data to point out the patterns and trends involving the use of visual elements such as charts, graphs, and maps.
Data warehousing
The process of storing and managing large volumes of data from various sources in a central location for easier access and analysis by businesses.
attribute
The characteristic for feature that defines an item in a dataset
Categorical data
The data whose represented quantities aren't measurable.
Jupyter Notebook
The web-based document that helps users run Python programs easier.
Programing language
A set of instructions or commands used to communicate with a computer and instruct it to perform specific tasks.
Attribute
Is a characteristic or feature that defines an item in a dataset.
Extensible Markup Language (XML)
The format of a dataset with which uses tags
Sampling Bias
Occurs when the sample used in a study isn’t representative of the population it intends to generalize to, leading to skewed or inaccurate conclusions.
Inferential Statistics
Using sample data to make inferences, predictions, and generalizations about a larger population.
Regression Analysis
A method for modeling the relationship between a dependent variable and one or more independent variables.
Dependent Variable
The variable being predicted or explained in a regression analysis.
Independent Variables
Variables used to predict or explain the dependent variable in a regression analysis.
Linear Regression
A regression model that uses a straight line to model the relationship between variables.
R-squared
A measure of how well the regression model fits the data; the proportion of variance in the dependent variable that can be predicted from the independent variable(s).
P-value
A statistical test used to determine if there is a statistically significant relationship between the independent and dependent variables in a regression model.
Confidence Interval
A range of values that is likely to contain the true value of a population parameter with a certain level of confidence.
Regression Diagnostics
The process of evaluating the assumptions of a regression model and checking for any violations.
Residual
The difference between the observed value of the dependent variable and the value predicted by the regression model.