ALL QUIZ BI

0.0(0)
studied byStudied by 15 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/71

flashcard set

Earn XP

Description and Tags

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

72 Terms

1
New cards

data unit

In a database, a ___ is called record (also known as a row or tuple) is a single, structured data item that is stored in a table.

2
New cards

dataset

A ____ is a collection of related data units and information that is composed of separate elements but can be manipulated as a unit by a computer. ____ is normally presented in a tabular pattern.

3
New cards

Data item

It is the equivalent of column in spreadsheet while dataset is equivalent to worksheet in spreadsheet.

4
New cards

dataset

A ____ is a collection of related data units and information that is composed of separate elements but can be manipulated as a unit by a computer. This set is normally presented in a tabular pattern. It is also known as table in database management system.

5
New cards

Data Warehousing

It integrates data and information collected from various sources into one comprehensive database.

6
New cards

ETL (Extract, Transform, Load)

It is the process of extracting data from various sources, transforming it into a consistent format, and loading it into a data warehouse or another storage system for analysis. ___ is a process used in data integration and data warehousing.

7
New cards

Data Lake

It is a centralized repository that allows organizations to store all their structured and unstructured data at any scale.

8
New cards

Data visualization

It is the graphical representation of data to facilitate understanding, analysis, and interpretation. It involves presenting data in visual formats such as charts, graphs, maps, and dashboards to communicate complex information clearly and effectively

9
New cards

Data mining

It is the process of searching and analyzing a large batch of raw data in order to identify patterns and extract useful information.

10
New cards

Machine learning

It is a branch of artificial intelligence (AI) and computer science which focuses on the development of algorithms and statistical models that enable computers to learn and improve their performance on a specific task without being explicitly programmed.

11
New cards

Big data

It is a combination of structured, semi-structured and unstructured data collected by organizations that can be mined for information and used in machine learning projects, predictive modeling and other advanced analytics applications.

12
New cards

Data Analytics

Since ____ is a wider concept in which data analysis is just a part. ___ is the broad field of using data and tools to make business decisions. Data analysis, a subset of _____, refers to specific actions.

13
New cards

Data science

It combines mathematics, and statistics, scientific methods, algorithms, specialized programming (information technology), advanced analytics, artificial intelligence (AI), and machine learning (ML) with specific subject matter expertise to uncover actionable insights hidden in an organization's data.

14
New cards

Deep learning

It is a subset of machine learning, which itself is a subset of artificial intelligence (AI). It involves using neural networks with many layers (hence "deep") to model and understand complex patterns in data.

15
New cards

Artificial intelligence (AI)

It is the simulation of human intelligence processes by machines, especially computer systems.

16
New cards

Internet of Things (IoT)

The ____ describes the network of physical objects-"things"-that are embedded with sensors, software, and other technologies for the purpose of connecting and exchanging data with other devices and systems over the internet.

17
New cards

ChatGPT

It is an artificial intelligence (AI) chatbot that uses natural language processing to create humanlike conversational dialogue.

18
New cards

Spreadsheet

It is a computer application or program used to organize, display, analyze, compute, manipulate and store data in a tabular format, typically presented in rows and columns. The most popular spreadsheet applications today are MS Excel, Google sheet, Apple Numbers, and LibreOffice Calc.

19
New cards

Database Management System

This, also often known as DBMS, is a software system that enables users to define, create, maintain, manipulate, and manage databases. Some of the most popular large-scale _____(s) are MS SQL, MySQL, Oracle, Teradata, DB2 (Mainframe) and Adabas (Mainframe).

20
New cards

Databases

These store structured data in a format optimized for efficient storage, retrieval, and manipulation. Structured data is typically stored in tabular form and managed in a relational database (RDBMS).

21
New cards

Data Visualization Tools

These are software applications or platforms that allow users to create visual representations of data. Some popular data visualization tools are Microsoft Power BI, Tableau, Google Data Studio, and QlikView.

22
New cards

Various approaches to data analytics include

23
New cards

a. looking at what happened (descriptive analytics)

24
New cards

b. why something happened (diagnostic analytics)

25
New cards

c. what is going to happen (predictive analytics), or

26
New cards

d. what should be done next (prescriptive analytics).

Various approaches to data analytics include

27
New cards

a. looking at what happened (?)

28
New cards

b. why something happened (?)

29
New cards

c. what is going to happen (?), or

30
New cards

d. what should be done next (?).

31
New cards

Descriptive analytics

It involves analyzing historical data to understand and examine what happened in the past. It focuses on summarizing and visualizing data to provide insights into trends, patterns, and relationships.

32
New cards

Diagnostic analytics

It helps explain why things happened the way they did. It's a more complex version of descriptive analytics, extending beyond what happened to why it happened.

33
New cards

Diagnostic analytics

It involves digging deeper into historical data to understand why certain events occurred.

34
New cards

Predictive analytics

It aims to predict likely outcomes and make educated forecasts using historical data. Simply put, it seeks to answer the question, "What will happen?". _____ use probabilities instead of simply interpreting existing facts.

35
New cards

Prescriptive analytics

It is the use of advanced processes and tools to analyze data and content to recommend the optimal course of action or strategy moving forward. _____ is the most advanced of the four types of data analytics.

36
New cards

Predicting future trends

37
New cards

Optimizing business operations

38
New cards

Enhancing decision-making

39
New cards

Transforming raw data

The primary goals of data analytics are:

40
New cards

Cloud Computing

It is the delivery of different services through the Internet. These resources include tools and applications like data storage, servers, databases, networking, and software. Database as a Service (DBaaS) is a _____ service model that provides users with access to managed database services over the internet. Some of the leading cloud service providers are Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP).

41
New cards

Business Intelligence

It is about descriptive and diagnostic analytics while Business Analytics is about predictive and prescriptive analytics. ____ and Business Analytics (BA) are both subsets of Data Analytics.

42
New cards

Structured Query Language (SQL)

It is a standard language for managing and manipulating relational databases. Mastering ___ empowers data analysts to derive insights from large datasets and optimize the performance of data-related operations.

43
New cards

Data quality

determines the usability and trustworthiness of data.

44
New cards

quality data

The characteristics of ____ are validity, accuracy, completeness, consistency, timeliness, relevance, and reliability.

45
New cards

Data quality

____ issues include incomplete, duplicate, outdated, insecure, inaccurate, incorrect, inconsistent, and outlier.

46
New cards

Data Validity

refers to the degree to which your data conforms to defined business rules or constraints.

47
New cards

Data Accuracy

ensures that your data is close to the true values.

48
New cards

Data consistency

refers to the degree to which all required data is supplied and known. _____ ensures your data is stable within the same data set and/or across multiple data sets. ____ occurs when aggregated data is reconciled with detailed data at lower levels of granularity.

49
New cards

Data Uniformity

refers to the degree to which data is specified using the same unit of

measure.

50
New cards

Data duplicate

also known as data redundancy, occurs when the same information is entered multiple times, sometimes in different formats. can be avoided by implementing record validation checks within a program to ensure that a record does not already exist before it is added to a dataset or database.

51
New cards

Outlier data

refers to the values that differ significantly from other values in your data set. Outlier data refers to observations or data points that deviate significantly from the rest of the dataset.

52
New cards

Insecure data

refers to sensitive data that are not encrypted or access controlled.

53
New cards

Incomplete data

occurs when you don't have data stored for certain variables or data items.

54
New cards

Incorrect data

can easily be prevented when data validation is in place.

55
New cards

Inconsistent data

occurs when there are multiple tables within a database that deal with the same data but may receive it from different inputs.

56
New cards

Inaccurate data

refers to data that contains errors and discrepancies that deviate from the true or expected values.

57
New cards

Constructive Transformation process

where data item is added, copied or replicated.

58
New cards

Destructive Transformation process

where data items or records are trimmed or deleted.

59
New cards

Aesthetic Transformation process

where certain values are standardized to meet requirements or parameters.

60
New cards

Structural transformation process

which includes columns being renamed, moved, and combined.

61
New cards

Data Cleaning

is also known as data cleansing and data scrubbing.

62
New cards

Garbage-In Garbage-Out or GIGO

simply means the quality of output is determined by the quality of the input.

63
New cards

data completeness

The ____ is likely to be achieved when you make the important fields mandatory in the data entry and data model.

64
New cards

Data timeliness

refers to data that is available when it is required.

65
New cards

discovery stage

At the ________ data teams work to understand, identify, and find all applicable raw data and data types that need to be transformed. Data discovery includes identifying and understanding data in its original source format with the help of data profiling tools.

66
New cards

data mapping stage

At the ________ , data teams determine how individual fields are matched, filtered, joined, modified, and aggregated.

67
New cards

extraction stage

At the _______, data teams move data from its source system into the staging areas.

68
New cards

code generation and execution stage

At _____ and _____, data teams generate and execute programs/codes based on the mapping process using a programming language.

69
New cards

review output stage

At the _______, the transformed data is evaluated by the data teams to ensure the conversion has had the desired results in terms of the format of the data.

70
New cards

target stage

At the send to _________, involves sending the transformed data to its target destination.

71
New cards

Data profiling

involves identifying patterns and inconsistencies in data. _______ helps identify data quality issues and assess the overall quality of the data.

72
New cards

Data set

The _________ must be updated or refreshed to replace the obsolete data with the newer data.