Data Analytics- Quiz 1

0.0(0)
studied byStudied by 21 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/36

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 6:54 PM on 2/9/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

37 Terms

1
New cards

What is data analytics

The science of examining raw data to conclude that information

A process of inspecting, cleansing, transforming and modeling data with the goal of discovering useful information, informing conclusion and supporting decision-making

2
New cards

Classical Statistics vs Data analytics

Classical statistics concentrates on “average” effect

Data analytics often makes predictions on an individual level
– which ad to display for THAT customer
– how likely for THAT student to drop out of college

3
New cards

Business Analytics

the process by which businesses use statistical methods and technologies for analyzing historical data in order to gain new insight and improve strategic decision making

4
New cards

4 types of data analytics

  • Past

    • Descriptive analytics

    • Diagnostic analytics

    • Future

    • Predictive analytics

    • Prescriptive analytics

5
New cards

Descriptive analytics

Descriptive analytics- What is happening in my business

  • Summarize large data set

  • Get essential insights

Past type

6
New cards

Diagnostic Analytics

Diagnostic analytics- Why is it happening?

  • Take descriptive and dig deeper

  • Identify anomalies 

Past

7
New cards

Predictive Analytics

Predictive analytics- What will happen in the future?

  • Historical patterns being used to predict specific outcomes

  • Business strategies have largely remained consistent over time

future

8
New cards

Prescriptive Analytics

Prescriptive analytics- What should be done?

  • Apply advanced analytical techniques and makes recommendations

  • By using insights from predictive analytics, data-driven decisions can be made

future

9
New cards

Know this graph about types of business analytics

knowt flashcard image
10
New cards

Big Data

refers to massive complex structured and unstructured data sets that are rapidly generated and transmitted from a wide variety of sources. 

11
New cards

4 V’s of Big Data

  • volume

  • variety

  • velocity

  • veracity

<ul><li><p>volume</p></li><li><p>variety</p></li><li><p>velocity </p></li><li><p>veracity</p></li></ul><p></p>
12
New cards

Volume (and bytes of measurement)

Data at Rest

  • Terabytes to exabytes of existing data to process

  • Measurements in Bytes

Byte- basic unit of measurement

Kilobyte (1024 byte)- 30KB is one page of text (2^10 of a byte)

Megabyte (1000KB) - 5 MB is a piece of music (2^10 MB)

Gigabyte (1000MB) - 1 GB is a two-hour film (2^10 MB)

Terabyte (1000 GB) - 1 TB is 6 million books

Petabyte (1000 TB) - 1 PB a stack of DVD’s as tall as a 55 story building

Exabyte (1000PB) - 5 EB all the information generated up to 2003

Zettabyte (1000EB) - 1.8 ZB is all the recorded data in 2011

Yottabyte (1000 ZB) 

<p>Data at Rest</p><ul><li><p>Terabytes to exabytes of existing data to process</p></li><li><p>Measurements in Bytes</p></li></ul><p>Byte- basic unit of measurement</p><p>Kilobyte (1024 byte)- <span style="background-color: transparent; font-family: &quot;Times New Roman&quot;, serif;"><span>30KB is one page of text (2^10 of a byte)</span></span></p><p><span style="background-color: transparent; font-family: &quot;Times New Roman&quot;, serif;"><span>Megabyte (1000KB) - 5 MB is a piece of music (2^10 MB)</span></span></p><p><span style="background-color: transparent; font-family: &quot;Times New Roman&quot;, serif;"><span>Gigabyte (1000MB) - 1 GB is a two-hour film (2^10 MB)</span></span></p><p><span style="background-color: transparent; font-family: &quot;Times New Roman&quot;, serif;"><span>Terabyte (1000 GB) - 1 TB is 6 million books</span></span></p><p><span style="background-color: transparent; font-family: &quot;Times New Roman&quot;, serif;"><span>Petabyte (1000 TB) - 1 PB a stack of DVD’s as tall as a 55 story building</span></span></p><p><span style="background-color: transparent; font-family: &quot;Times New Roman&quot;, serif;"><span>Exabyte (1000PB) - 5 EB all the information generated up to 2003</span></span></p><p><span style="background-color: transparent; font-family: &quot;Times New Roman&quot;, serif;"><span>Zettabyte (1000EB) - 1.8 ZB is all the recorded data in 2011</span></span></p><p><span style="background-color: transparent; font-family: &quot;Times New Roman&quot;, serif;"><span>Yottabyte (1000 ZB)&nbsp;</span></span></p><p></p>
13
New cards

Variety

Data in many forms

  • Structured data- anything that can neatly be displayed in rows and columns

    • Requires less storage

    • Easier to manage and protect

  • Unstructured data - cannot be displayed in rows or columns- images, video, audio, word processing files

    • Requires more storage

    • More difficult to manage and protect

  • Text- words or written text (emails)

  • Multimedia- non-text (images, videos, audio)

14
New cards

Velocity

Data in Motion (speed at which data is processed)

  • Streaming data, milliseconds to seconds to respond

  • How fast data goes from point A to point B

  • Ex: Facebook users upload more than 900 million photos a day

  • Measurements

    • bits/sec = bps 

    • kilobits/sec = kbps (10^3, or 1000 bits/sec)

    • megabits/sec =  mbps (10^3, or 1000 kbps/sec or 1,000,000 bps (a million bits per second))

    • gigabit/sec = gbps (10^3, 1000 mbps, 1B bps)

  • Download content more than you can push content up

15
New cards

Veracity

Data in doubt

Uncertainty due to data inconsistency & incompleteness, ambiguities, latency, deception, model approximations 

Refers to quality, trustworthiness of data, lack of bias, noise, and abnormalities 

16
New cards

Supervised vs Unsupervised learning

supervised learning has a target and you train the machine using data which is well “labeled”

  • involves building a model to estimate or predict an output based on one or more inputs

  • end goal: predict new values or understanding existing relationships between explanatory and response variables

  • classification and regression

Unsupervised learning does not have a target and you do not need to supervise the model

  • involves finding structure and relationships from inputs. there us no “supervising” output

    • end goal: place observations from a dataset into a specific cluster or to create rules to identify associations between variables

    • clustering and association

<p><strong>supervised</strong> <strong>learning</strong> has a target and you train the machine using data which is well “labeled”</p><ul><li><p>involves building a model to estimate or predict an output based on one or more inputs</p></li><li><p>end goal: <u>predict new values</u> or understanding existing relationships between explanatory and response variables</p></li><li><p><u>classification and regression</u></p></li></ul><p><strong>Unsupervised learning</strong> does not have a target and you do not need to supervise the model</p><ul><li><p>involves <u>finding structure and relationships</u> from inputs. there us no “supervising” output</p><ul><li><p>end goal: place observations from a dataset into a specific cluster or to create rules to identify associations between variables</p></li><li><p><u>clustering and association</u></p></li></ul></li></ul><p></p>
17
New cards

Data Privacy

branch of data security related to the proper collection, usage, and transmission of data

  • concerns around how data is legally collected and stored

  • if and how data are shared with third parties

  • how data collection usage and transmission meet regulations

also called information privacy

18
New cards

Three key principles of data privacy

  • Confidentiality

    • Customer data and identify remain private

    • Medical and financial data are highly sensitive

  • Transparency

    • Data processing and automated decisions are transparent

    • Risks including are understood: social and ethical

  • Accountability

    • Reflective, reasonable, and systematic use and protection of data

    • protections against unauthorized or unlawful processing or accidental loss or destruction

19
New cards

Data Mining

  • a set of statistical and machine learning methods that inform decision making, often in an automated fashion

  • Data Mining for broad public may mean “Digging through vast stores of data in search of something interesting”

  • Also known as predictive modeling 

20
New cards

Data Ethics

  • a branch of ethics that studies moral problems related to data

  • Evaluates if data are being used for doing the right thing for people and society.

    • there are two key considerations:
      a. Human first: the human being stays at the center and human interests always outweigh institutional and commercial interests.
      b. No biases: the algorithms do not absorb
      bias or amplify them in analysis

21
New cards

Data

compilations of facts, figures, or other content.
• Numerical and non-numerical.
• Often we have a large amount of data.
• Even small data can give insights

22
New cards

Information

Data that have been organized, analyzed, and processed in a
meaningful and purposeful way

23
New cards

Knowledgeable

Use a blend of data, contextual information, experience, and
intuition

24
New cards

Population


consists of all items of interest in an analytics application.

• Not feasible to collect data that comprise a population.
• Too expensive or too big.

25
New cards

Sample

a subset of the population.
• Representative of the population.
• Compute a sample statistic to estimate the unknown population parameter.
• Make inferences about the unknown population parameter

26
New cards

cross-sectional data

Record a characteristic of many subjects at the same point in time,or without regard to time.

People, households, firms, industries, regions.

  • ex: Batting averages of all MLB players during the 2025 season

  • Home runs hit by each team in a single season

normally some type of chart/bar graph

27
New cards

time series data

Collected over several time periods focusing on certain groups of people, specific events, or objects.

Hourly, daily, weekly, monthly, quarterly, or annual observations.

  • Attendance at a stadium each home game over a season

normally a line graph

28
New cards

types of data (human or machine)

• Data can be human- or machine-generated.


• Structured human: price, income, retail sales.
• Structured machine: sensors, speed cameras, web server logs.
• Unstructured human: email, text, social media, presentations.
• Unstructured machine: satellite images, video data, camera images

29
New cards

Variable

characteristic of interest that differs in kind of degree among various observations

two types of variables

  • categorical (categories, names, labels) or qualitative number values)

  • numeric or quantitative

    • Discrete: assumes a countable number of values. (number of goals in a soccer game, number of emails today)

    • Continuous: assumes an uncountable number of values within an interval. (height, weight)

30
New cards

File formats

Formatting data in a standardized manner allows people to understand data files.

Two common layouts for text files:

  • Fixed-width format.

  • Delimited format.

31
New cards

Fixed width format

each column starts and ends at the same place in every row.
• Specific data can be found at the exact location
for every record.
• The data are stored as plain text characters.
• Simple files that are smaller in size.

<p><span><span>each column starts and ends at the same place in every row.</span></span><br><span><span>• Specific data can be found at the exact location</span></span><br><span><span>for every record.</span></span><br><span><span>• The data are stored as plain text characters.</span></span><br><span><span>• Simple files that are smaller in size.</span></span></p>
32
New cards

Delimited format

each column is separated by a delimiter

Delimiter is a character to separate fields.
• A comma is typical giving C S V files.
• Each column can contain as many characters as applicable

<p>each column is separated by a delimiter </p><p><span>•</span><span><span>Delimiter is a character to separate fields.</span></span><br><span><span>• A comma is typical giving C S V files.</span></span><br><span><span>• Each column can contain as many characters as applicable</span></span></p>
33
New cards

Extensible Markup Language (XML)

a simple text-based markup language for representing structured data

• XML is widely used to share structured information.
• It uses user-defined markup tags to specify the structure of data.
• Each piece of data is enclosed in a pair of tags.
• Designed to support readability.
• Because of tags, files are much larger

<p>a simple text-based markup language for representing structured data</p><p><span>• XML is widely used to </span><strong><u><span>share structured information.</span></u></strong><br><span>• It uses user-defined markup tags to specify the structure of data.</span><br><span>• Each piece of data is enclosed in a pair of tags.</span><br><span>• Designed to support readability.</span><br><span>• Because of tags, files are much larger</span></p>
34
New cards

HyperText Markup Language (HTML)

is a mark-up language that uses tags to define data for web pages

• Gives information on how to display the data.
• Different tags for different elements.
• Conforms to standards maintained by organizations such as the World Wide Web Consortium.
• For example, <table> provides structure for textual data

<p>is a mark-up language that uses tags to define data for<u> </u><strong><u>web pages</u></strong></p><p><span>• Gives information on how to display the data.</span><br><span>• Different tags for different elements.</span><br><span>• Conforms to standards maintained by organizations such as the World Wide Web Consortium.</span><br><span>• For example, &lt;table&gt; provides structure for textual data</span></p>
35
New cards

JaveScript Object Notation (JSON)

standards for transmitting human-readable data

•Popular alternative to XML.
• Supported by many programming languages such as C and Python.
• Not as verbose as XML, making files smaller.
• Supports wide range of data types.
• Parsing is faster and less resource intensive

<p><strong><u>standards for transmitting human-readable data</u></strong></p><p><span>•Popular alternative to XML.</span><br><span>• Supported by many programming languages such as C and Python.</span><br><span>• Not as verbose as XML, making files smaller.</span><br><span>• Supports wide range of data types.</span><br><span>• Parsing is faster and less resource intensive</span></p>
36
New cards

Generative AI

create new content like music and text

37
New cards

Interval vs Ratio scale

interval- can categorize and rank to find meaningful differences between them (60* is hotter than 50*)

ratio- same as interval but 0 holds importance (sales, profit, inventory, weight, time, distance)

strongest level of measurement