DATA200 Exam 1

studied byStudied by 0 people
0.0(0)
learn
LearnA personalized and smart learning plan
exam
Practice TestTake a test on your terms and definitions
spaced repetition
Spaced RepetitionScientifically backed study method
heart puzzle
Matching GameHow quick can you match all your cards?
flashcards
FlashcardsStudy terms and definitions

1 / 215

encourage image

There's no tags or description

Looks like no one added any tags here yet for you.

216 Terms

1

ethics and legality in an ideal world

would fully overlap

New cards
2

ethics and legality in the real world

only have a slight amount of overlap

New cards
3

data ethics

the set of principles and processes that guide the ethical collection, processing, analysis, use and application of data having an effect on human lives and society

- aim to create an ethical/moral code of conduct for data use and collection

New cards
4

data science

application of computational and statistical techniques to address/gain insight on a real-world problem

- data can be unstructured, structured, semi-structured

- data cleansing, prep, analysis

New cards
5

Data Science Project Process

Identify

Design Research Plan

Collect data

Analyze data

Extract result

Publish/exploit results

New cards
6

where in the Data Science Project Process does "ethics approval" normally occur?

Designing your research plan

New cards
7

Step 1 DSPP

Identify hypothesis/equation

Example: A study about online shopping recommendations. Researchers plan to collect data about users' shopping habits.

New cards
8

Step 2 DSPP

Design research plan

If there any potential positive (get more personalized shoppingsuggestion) or negative (might feel their privacy is invaded) impacts on participants,ensuring their well-being and informed consent. Researchers should focus on bothpotential positive and negative impacts

this is where ethics approval occurs

New cards
9

Step 3 DSPP

Collect Data

* How could bias in the data affect results? Could data be used for other purposes?

Bias in Data: Biased data can lead to biased results, affecting fairness and accuracy of findings.(data only comes from a certain demographic, the recommendations might only work well for that group, causing unfairness for others.)

Data Usage: Being aware that collected data might be repurposed for unintended uses that could harm privacy and consent. (later be used to target users with ads without their consent.)

New cards
10

Step 4 DSPP

Analyze data

* Is the analysis of the data introducing bias? any uncertainty or assumptions?

Bias in Analysis: Identify if the analysis method introduces bias (researchers only focuson certain types of purchases, the recommendations could be biased towards thoseitems, affecting the accuracy and fairness of results.)● Uncertainty and Assumptions: they impact the reliability and scope of the results.(Assuming that users' past purchases predict their future preferences might not hold truefor everyone, introducing uncertainty into the results.)

New cards
11

Step 5 DSPP

Extract results

* Can the result be misinterpreted? Are the uncertainties, assumptions, biases properly represented?

Misinterpretation: Presenting improved shopping recommendations might be misinterpreted as an absolute guarantee of satisfaction when it's actually a probability.

Representing Uncertainties, Assumptions, Biases: What are the limitations and potential sources of error to provide an accurate context for the results. (The researchers should include a clear explanation of how they arrived at the recommendations, addressing potential biases and limitations.

New cards
12

Step 6 DSPP

Publish/Exploit Result

* Can the result outcomes be abused? Can they disclosing or not disclosing have ethical consequences?

Abuse of Results: cause harm or unethical actions.(manipulate users into buying things they don't need.)

Ethical Consequences of Disclosure: Weighing the ethical implications of sharing or withholding certain results, and how they might affect individuals or society. It's like when you have a secret and you need to decide if you should tell your friend.

New cards
13

why is the Data Science Project Process important?

Because you're actively practicing data ethics, which involves not only complying with legal regulations but also making ethical choices that promote fairness, transparency, and the well-being of individuals and society. This approach enhances the credibility and impact of your research while minimizing potential negative consequences

New cards
14

ethics

shared principles guiding moral judgement

- cornerstone of civilization

New cards
15

ethics vs laws

ethics guide the creating of laws

- laws can be used to enforce ethical behaviors, but ethical values are not laws. even though it is not ethical to tell a secret, it is not against the law.

New cards
16

morals

individual's beliefs concerning what is right or wrong

- often shaped by cultural, religious, or personal values

- subjective and varied amongst everyone

example: vegetarianism

- it is not against the law or unethical to eat meat overall, vegetarians just believe that they do not want to consume it

New cards
17

data

any set of information that can be collected, stored, analyzed

New cards
18

big data

datasets with large volume, created and updated with high velocity, that have various structure and format

New cards
19

volume

the amount of data from sources

New cards
20

velocity

the speed at which big data is generated

New cards
21

variety

the types of data (structure,d unstructured, semi-structured)

New cards
22

how do companies manage large datasets?

Apache Spark, Hadoop, Cloud-based storage (AWS, google cloud)

New cards
23

how do data scientists deal with variety in data?

statistics, CS, machine learning, AI

New cards
24

why utilize data?

for better decision-making, customer-centric facilities, and target marketing

New cards
25

example of data facilitator

internet because it is used by all types of people and organizations

New cards
26

What can you analyze from this data?

- medical records

- patient demographics

lab results

predict disease outbreaks

optimize treatment plans

provide insights for medical research

New cards
27

what data are you sharing?

social network friend's list

location

web searches

chat logs

IP addresses

web history

New cards
28

why are data ethics needed?

though it can be very helpful, people can misuse it. therefore, ethics need to be considered

- rules still evolving

New cards
29

decision-making and policy development

policies must be fair and based on reliable data

New cards
30

social and economic inequality

need to avoid systematic bias and ensure equitable outcomes

New cards
31

privacy and data protection

maintain individual's privacy rights

New cards
32

societal impact of data science and the need for responsible practices

trust and transparency

ethical considerations in AI and automation

social good and public interest

data governance and regulation

New cards
33

benefits of data ethics

consistency

better data-driven decisions

increased transparency

New cards
34

consistency

helps all data users navigate the ethical considerations of data use

New cards
35

better data-driven decisions

uncover data limitations, gaps, and biases; facilitate justifiable decisions with data --> promote transparency

New cards
36

increased transparency

trustworthy data processes to increase the transparency of their data

New cards
37

structured data

names, addresses, credit card, numbers, geolocation

New cards
38

unstructured data

photos, audio, video, social media posts

New cards
39

major sources of abundant data

business, science, society and everyone

New cards
40

why is data ethics needed?

data is a valuable asset that can build the next great business/innovation, but it is also a resource lots of organizations are not protecting or using ethically.

New cards
41

ethics

govern professional interactions

New cards
42

laws

govern society as a whole

New cards
43

morals

governs private, personal interactions

New cards
44

what are areas of concern in data ethics?

data collection

data ownership

data privacy

data anonymity

data validity

algorithm, statistical fairness

New cards
45

data collection

process of gathering information from various sources

New cards
46

example of data collection ethics

ensuring that respondents give consent before providing their information

New cards
47

data ownership

the act of having legal rights and complete control to make decisions about data

New cards
48

example of data ownership ethics

you took a stunning sunset photo of me, but before you include it in a blog post, you ask for explicit permission from me to use the photo for a specific purpose

New cards
49

important lesson in data ownership

must ask for consent and respect the right of the data owner

New cards
50

data privacy

the protection and appropriate use of personal information

involves safeguarding individuals' personal data and ensuring that their privacy rights are respected

New cards
51

example of data privacy ethics

a customer gives the company consent to collect and store their PII, but that does not mean they want it publicly available

New cards
52

ways to implement data privacy

dual authentication passwords

file encryption

informed consent

New cards
53

data anonymity

preventing the identification of individuals within a dataset when handling and sharing data, so that data cannot be linked back to someone

New cards
54

confidentiality

centers around protecting data from unauthorized access or disclosure --> implementing security measures

New cards
55

data validity

the integrity and accuracy of data being collected, analyzed for various purposes

represents real-world phenomena and refrains from errors and biases

New cards
56

algorithmic fairness

ethical consideration and practice of ensuring that algorithms used in decision-making processes do not result in unfair or biased outcomes

New cards
57

misuse of statistics

unethical practice of distorting or manipulating statistical information to support a particular agenda or draw false conclusions

New cards
58

data confidentiality

ensuring that data is protected from unauthorized use.

implementing security measures to prevent data breaches

New cards
59

example of data privacy

social media privacy

New cards
60

example of data anonymity

healthcare data does not have names, just identification numbers

New cards
61

example of data confidentiality

bank employing encryption methods to protect data and implement strict access controls

New cards
62

Facebook Cambridge Analytica Scandal

data acquisition through a third party app called "This is your Digital Life"

New cards
63

Facebook Cambridge Analytica Scandal - Data harvesting from friends

the real ethical data breach occurred when the app accessed personal data like their facebook friends

New cards
64

Facebook Cambridge Analytica Scandal - Scope of data collected

profile details, likes, friends' lists, users' psychological profiles and preferences

New cards
65

Facebook Cambridge Analytica Scandal - Purpose of data collected

used by cambridge analytica (political consulting firm) to target political ads and to influence voting behavior

New cards
66

First AI Beauty Contest in 2016

use AI to judge and select winners based on the contestants' submitted photos

- included only one certain skin tone although many people of color submitted photos

New cards
67

First AI Beauty Contest in 2016 - Violation of Fairness

exhibited racial bias by disproportionately selecting winners who were unrepresentative of global population

New cards
68

First AI Beauty Contest in 2016 - Underrepresentation

lack of diversity leads the questions of whether the AI system was genuinely inclusive and fair in its assessment

New cards
69

First AI Beauty Contest in 2016 - Root causes of bias

data bias and algorithmic bias

New cards
70

data bias

lack of diversity in the training data for AI

New cards
71

algorithmic bias

if training data lacks diversity, the algorithm may not have learned to recognize and appreciate a broad range of beauty standards

New cards
72

concerns with TikTok

owned by a foreign company (ByteDance)

questions have been raised about where TikTok stores user data, especially given the foreign country ownership of the company

- potential for data access by the foreign government

- how does it select and prioritize recommended content

New cards
73

Facebook and mental health services

crisis intervention bot

New cards
74

crisis intervention bots

engage with uses who express thoughts of self-hard or suicide and provide immediate support

New cards
75

the issue with Facebook and mental health services

visitors of websites are often forced to have their information collected when it is said to be anonymous

Meta Pixel allows facebook users to call the crisis lines, but it seems their data was being sent to facebook and was not anonymous anymore (pixel-based data could be unscrambled easily to reveal true identities

New cards
76

Quest Diagnostics Breach 2019

significant cybersecurity incident where an unauthorized user gained access to the systems of a third-party billing collections vendor used by Quest and exposed personal and medical information of 12 million patients

New cards
77

scope of the Quest Diagnostics Breach 2019

one of the largest healthcare data breaches at the time

New cards
78

data exposed in Quest Diagnostics Breach 2019

names, addresses, phone #s, DOB, SSN, lab results

New cards
79

Quest Diagnostics Breach 2019 and its relation to complex ownership

highlights the challenges in determining who ultimately owns and is responsible for protecting patient data when so many parties are involved

New cards
80

BBC News: Facial recognition fails on race

NIST found that facial recognition software exhibited lower accuracy in identifying faces of African American and Asian Americans compared to Caucasians

particularly pronounced in one-to-one matches

New cards
81

root cause of the bias reason for BBC News scandal

the facial recognition software was primarily trained on databases from govt. agencies (State dept., FBI) consiting of primarily white individuals

therefore the lack of diversity in the training data resulted in the algorithm inheriting biases

New cards
82

lesson learned fro BBC News scandal

critical need for representative datasets to address algorithmic bias

New cards
83

Question Diagnostics data breach in 2019: what to consider

clear data ownership agreements

data access controls

data encryption

data minimization

security audits and assessments

New cards
84

sources of data

primary and secondary

New cards
85

primary data

no related research is done on the subject/topic

collecting brand new data

New cards
86

secondary data

data regarding a specific topic is readily available or collected by someone else

New cards
87

before collecting data, need to consider

the question you aim to answer

the data subjects you need to collect data from

the collection timeframe

data collection methods best suited to your needs

New cards
88

types of quantitative data

raw #s and digits

ration

internal

New cards
89

types of qualitative data

customer reviews or feedback

nominal and ordinal

New cards
90

how can qualitative data be collected?

answering questions (how much, how often)

one-on-one interviews, observations, focus group meetings, surveys

allows mathematical analysis

New cards
91

categorical data

grouped based on the categories

New cards
92

three classification types for categorical data

binary

nominal

ordinal

New cards
93

binary

only take two possible states (true/false)(yes/no)

New cards
94

nominal

labeled data classified into various groups with no ranks or order between them

(country, gender, haircolor)

New cards
95

ordinal

groups based on order or ranking

(economic status)

New cards
96

first-party data

collected directly from users by your organization

(most valuable because you receive information about how your audience behaves, thinks, feels - all from a trusted source)

New cards
97

second-party data

data shared by another organization about its customers (or its first-party data

New cards
98

third-party data

data that has been aggregated and rented or sold by orgs. that do not have a connection to your company or users

New cards
99

data collection methods

online survey

paper survey

interview and focus groups

forms

polls

voting

social media monitoring

online tracking

New cards
100

interviews

one-on-one conversations

New cards
robot