Principles of Data Science Vocabulary

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/61

flashcard set

Earn XP

Description and Tags

Flashcards for reviewing key vocabulary from Principles of Data Science.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

62 Terms

1
New cards

Data

Pieces of evidence or observations that can be analyzed to provide some insights.

2
New cards

Data Science

A field of study that investigates how to collect, manage, and analyze data of all types in order to retrieve meaningful information.

3
New cards

Data analysis

The process of examining and interpreting raw data to uncover patterns, discover meaningful insights, and make informed decisions.

4
New cards

Data science cycle

A process, including problem definition, data collection, preparation, analysis, and reporting.

5
New cards

Data collection

The systematic process of gathering information on variables of interest.

6
New cards

Data reporting

The presentation of data in a way that will best convey the information learned from data analysis.

7
New cards

Data warehousing

Storing and managing large volumes of data from various sources in a central location for easier access and analysis by businesses.

8
New cards

Feature

Each item is defined by a combination of attributes or characteristics.

9
New cards

Quantitative data

Data that are measured and expressed using numbers.

10
New cards

Qualitative data

Data that are non-numerical data that generally describe subjective attributes or characteristics.

11
New cards

Numeric data

Data represented in numbers that indicate measurable quantities.

12
New cards

Continuous data

Data where the values can be any number.

13
New cards

Discrete data

Data where the values follow a specific precision, which makes the set of possible values finite.

14
New cards

Categorical Data

Data represented in different forms such as words, symbols, and even numbers where a categorical value is chosen from a finite set of values, and the value does not necessarily indicate a measurable quantity.

15
New cards

Nominal Data

Data where the set of possible values does not include any ordering notion.

16
New cards

Ordinal Data

Data where the set of possible values includes an ordering notion.

17
New cards

Dataset

A collection of observations or data entities organized for analysis and interpretation.

18
New cards

Unstructured dataset

A dataset that lacks a predefined or organized data model.

19
New cards

Structured dataset

A dataset organized in a tabular format with clearly defined fields and relationships.

20
New cards

Comma-Separated Values (CSV)

Stores each item in the dataset in a single line, with variable values for each item listed all in one line, separated by commas (,).

21
New cards

JavaScript Object Notation (JSON)

Uses the syntax of a programming language named JavaScript and follows JavaScript’s object syntax.

22
New cards

Extensible Markup Language (XML)

Lists each item of the dataset using different symbols named tags.

23
New cards

Spreadsheet Programs

Consist of electronic worksheets with rows and columns where data can be entered, manipulated, and calculated.

24
New cards

Programming Language

A formal language that consists of a set of instructions or commands used to communicate with a computer and to instruct it to perform specific tasks that may include data manipulation, computation, and input/output operations.

25
New cards

Pandas

A Python library specialized for data manipulation and analysis, and it is very commonly used among data scientists.

26
New cards

Google Colaboratory (Colab)

Google’s free application to edit a web-based environment that allows you to run a Python program more interactively, using programming code, math equations, visualizations, and plain texts.

27
New cards

DataFrame

A data type that Pandas uses to store multi-column tabular data.

28
New cards

Recommendation systems

A system that makes data-driven, personalized suggestions for users.

29
New cards

Jupyter Notebook

A web-based document that helps users run Python programs more interactively.

30
New cards

Categorical data

Data that is represented in different forms and do not indicate measurable quantities.

31
New cards

Sabermetrics

A statistical approach to sports team management.

32
New cards

Sports analytics

Use of data and business analytics in sports.

33
New cards

Adaptive learning

A technique used in education that involves personalized material for each learner based on their past performance.

34
New cards

Internet of Things (IoT)

Describes a network of multiple objects interacting with each other through the Internet.

35
New cards

Precision Medicine Initiative

A research endeavor with the goal of better understanding how a person's genetics, environment, and lifestyle can help determine the best approach to prevent or treat disease.

36
New cards

Qualitative data

A non-numerical data that generally describe subjective attributes or characteristics.

37
New cards

Quantitative data

A measureable data with specific quanities and amounts.

38
New cards

Predictive analytics

Statistical techniques, algorithms, and machine learning that analyze historical data and make predictions about future events.

39
New cards

Data analysis

The process of examining and interpreting raw data to uncover patterns, discover meaningful insights, and make informed decisions

40
New cards

Problem definition

The first step in the data science cycle is a precise definition of the problem statement to establish clear objectives for the goal and scope of the data analysis project.

41
New cards

Data collection

The systematic process of gathering information on variables of interest.

42
New cards

Data preparation

The second step within the data science cycle; converts the collected data into an optimal form for analysis.

43
New cards

Data reporting

Involves the presentation of data in a way that will best convey the information learned from data analysis.

44
New cards

Data visualization

The graphical representation of data to point out the patterns and trends involving the use of visual elements such as charts, graphs, and maps.

45
New cards

Data warehousing

The process of storing and managing large volumes of data from various sources in a central location for easier access and analysis by businesses.

46
New cards

attribute

The characteristic for feature that defines an item in a dataset

47
New cards

Categorical data

The data whose represented quantities aren't measurable.

48
New cards

Jupyter Notebook

The web-based document that helps users run Python programs easier.

49
New cards

Programing language

A set of instructions or commands used to communicate with a computer and instruct it to perform specific tasks.

50
New cards

Attribute

Is a characteristic or feature that defines an item in a dataset.

51
New cards

Extensible Markup Language (XML)

The format of a dataset with which uses tags

52
New cards

Sampling Bias

Occurs when the sample used in a study isn’t representative of the population it intends to generalize to, leading to skewed or inaccurate conclusions.

53
New cards

Inferential Statistics

Using sample data to make inferences, predictions, and generalizations about a larger population.

54
New cards

Regression Analysis

A method for modeling the relationship between a dependent variable and one or more independent variables.

55
New cards

Dependent Variable

The variable being predicted or explained in a regression analysis.

56
New cards

Independent Variables

Variables used to predict or explain the dependent variable in a regression analysis.

57
New cards

Linear Regression

A regression model that uses a straight line to model the relationship between variables.

58
New cards

R-squared

A measure of how well the regression model fits the data; the proportion of variance in the dependent variable that can be predicted from the independent variable(s).

59
New cards

P-value

A statistical test used to determine if there is a statistically significant relationship between the independent and dependent variables in a regression model.

60
New cards

Confidence Interval

A range of values that is likely to contain the true value of a population parameter with a certain level of confidence.

61
New cards

Regression Diagnostics

The process of evaluating the assumptions of a regression model and checking for any violations.

62
New cards

Residual

The difference between the observed value of the dependent variable and the value predicted by the regression model.