DATA200 Exam 1

0.0(0)
studied byStudied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/215

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 4:15 PM on 3/12/25
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

216 Terms

1
New cards

ethics and legality in an ideal world

would fully overlap

2
New cards

ethics and legality in the real world

only have a slight amount of overlap

3
New cards

data ethics

the set of principles and processes that guide the ethical collection, processing, analysis, use and application of data having an effect on human lives and society

- aim to create an ethical/moral code of conduct for data use and collection

4
New cards

data science

application of computational and statistical techniques to address/gain insight on a real-world problem

- data can be unstructured, structured, semi-structured

- data cleansing, prep, analysis

5
New cards

Data Science Project Process

Identify

Design Research Plan

Collect data

Analyze data

Extract result

Publish/exploit results

6
New cards

where in the Data Science Project Process does "ethics approval" normally occur?

Designing your research plan

7
New cards

Step 1 DSPP

Identify hypothesis/equation

Example: A study about online shopping recommendations. Researchers plan to collect data about users' shopping habits.

8
New cards

Step 2 DSPP

Design research plan

If there any potential positive (get more personalized shoppingsuggestion) or negative (might feel their privacy is invaded) impacts on participants,ensuring their well-being and informed consent. Researchers should focus on bothpotential positive and negative impacts

this is where ethics approval occurs

9
New cards

Step 3 DSPP

Collect Data

* How could bias in the data affect results? Could data be used for other purposes?

Bias in Data: Biased data can lead to biased results, affecting fairness and accuracy of findings.(data only comes from a certain demographic, the recommendations might only work well for that group, causing unfairness for others.)

Data Usage: Being aware that collected data might be repurposed for unintended uses that could harm privacy and consent. (later be used to target users with ads without their consent.)

10
New cards

Step 4 DSPP

Analyze data

* Is the analysis of the data introducing bias? any uncertainty or assumptions?

Bias in Analysis: Identify if the analysis method introduces bias (researchers only focuson certain types of purchases, the recommendations could be biased towards thoseitems, affecting the accuracy and fairness of results.)● Uncertainty and Assumptions: they impact the reliability and scope of the results.(Assuming that users' past purchases predict their future preferences might not hold truefor everyone, introducing uncertainty into the results.)

11
New cards

Step 5 DSPP

Extract results

* Can the result be misinterpreted? Are the uncertainties, assumptions, biases properly represented?

Misinterpretation: Presenting improved shopping recommendations might be misinterpreted as an absolute guarantee of satisfaction when it's actually a probability.

Representing Uncertainties, Assumptions, Biases: What are the limitations and potential sources of error to provide an accurate context for the results. (The researchers should include a clear explanation of how they arrived at the recommendations, addressing potential biases and limitations.

12
New cards

Step 6 DSPP

Publish/Exploit Result

* Can the result outcomes be abused? Can they disclosing or not disclosing have ethical consequences?

Abuse of Results: cause harm or unethical actions.(manipulate users into buying things they don't need.)

Ethical Consequences of Disclosure: Weighing the ethical implications of sharing or withholding certain results, and how they might affect individuals or society. It's like when you have a secret and you need to decide if you should tell your friend.

13
New cards

why is the Data Science Project Process important?

Because you're actively practicing data ethics, which involves not only complying with legal regulations but also making ethical choices that promote fairness, transparency, and the well-being of individuals and society. This approach enhances the credibility and impact of your research while minimizing potential negative consequences

14
New cards

ethics

shared principles guiding moral judgement

- cornerstone of civilization

15
New cards

ethics vs laws

ethics guide the creating of laws

- laws can be used to enforce ethical behaviors, but ethical values are not laws. even though it is not ethical to tell a secret, it is not against the law.

16
New cards

morals

individual's beliefs concerning what is right or wrong

- often shaped by cultural, religious, or personal values

- subjective and varied amongst everyone

example: vegetarianism

- it is not against the law or unethical to eat meat overall, vegetarians just believe that they do not want to consume it

17
New cards

data

any set of information that can be collected, stored, analyzed

18
New cards

big data

datasets with large volume, created and updated with high velocity, that have various structure and format

19
New cards

volume

the amount of data from sources

20
New cards

velocity

the speed at which big data is generated

21
New cards

variety

the types of data (structure,d unstructured, semi-structured)

22
New cards

how do companies manage large datasets?

Apache Spark, Hadoop, Cloud-based storage (AWS, google cloud)

23
New cards

how do data scientists deal with variety in data?

statistics, CS, machine learning, AI

24
New cards

why utilize data?

for better decision-making, customer-centric facilities, and target marketing

25
New cards

example of data facilitator

internet because it is used by all types of people and organizations

26
New cards

What can you analyze from this data?

- medical records

- patient demographics

lab results

predict disease outbreaks

optimize treatment plans

provide insights for medical research

27
New cards

what data are you sharing?

social network friend's list

location

web searches

chat logs

IP addresses

web history

28
New cards

why are data ethics needed?

though it can be very helpful, people can misuse it. therefore, ethics need to be considered

- rules still evolving

29
New cards

decision-making and policy development

policies must be fair and based on reliable data

30
New cards

social and economic inequality

need to avoid systematic bias and ensure equitable outcomes

31
New cards

privacy and data protection

maintain individual's privacy rights

32
New cards

societal impact of data science and the need for responsible practices

trust and transparency

ethical considerations in AI and automation

social good and public interest

data governance and regulation

33
New cards

benefits of data ethics

consistency

better data-driven decisions

increased transparency

34
New cards

consistency

helps all data users navigate the ethical considerations of data use

35
New cards

better data-driven decisions

uncover data limitations, gaps, and biases; facilitate justifiable decisions with data --> promote transparency

36
New cards

increased transparency

trustworthy data processes to increase the transparency of their data

37
New cards

structured data

names, addresses, credit card, numbers, geolocation

38
New cards

unstructured data

photos, audio, video, social media posts

39
New cards

major sources of abundant data

business, science, society and everyone

40
New cards

why is data ethics needed?

data is a valuable asset that can build the next great business/innovation, but it is also a resource lots of organizations are not protecting or using ethically.

41
New cards

ethics

govern professional interactions

42
New cards

laws

govern society as a whole

43
New cards

morals

governs private, personal interactions

44
New cards

what are areas of concern in data ethics?

data collection

data ownership

data privacy

data anonymity

data validity

algorithm, statistical fairness

45
New cards

data collection

process of gathering information from various sources

46
New cards

example of data collection ethics

ensuring that respondents give consent before providing their information

47
New cards

data ownership

the act of having legal rights and complete control to make decisions about data

48
New cards

example of data ownership ethics

you took a stunning sunset photo of me, but before you include it in a blog post, you ask for explicit permission from me to use the photo for a specific purpose

49
New cards

important lesson in data ownership

must ask for consent and respect the right of the data owner

50
New cards

data privacy

the protection and appropriate use of personal information

involves safeguarding individuals' personal data and ensuring that their privacy rights are respected

51
New cards

example of data privacy ethics

a customer gives the company consent to collect and store their PII, but that does not mean they want it publicly available

52
New cards

ways to implement data privacy

dual authentication passwords

file encryption

informed consent

53
New cards

data anonymity

preventing the identification of individuals within a dataset when handling and sharing data, so that data cannot be linked back to someone

54
New cards

confidentiality

centers around protecting data from unauthorized access or disclosure --> implementing security measures

55
New cards

data validity

the integrity and accuracy of data being collected, analyzed for various purposes

represents real-world phenomena and refrains from errors and biases

56
New cards

algorithmic fairness

ethical consideration and practice of ensuring that algorithms used in decision-making processes do not result in unfair or biased outcomes

57
New cards

misuse of statistics

unethical practice of distorting or manipulating statistical information to support a particular agenda or draw false conclusions

58
New cards

data confidentiality

ensuring that data is protected from unauthorized use.

implementing security measures to prevent data breaches

59
New cards

example of data privacy

social media privacy

60
New cards

example of data anonymity

healthcare data does not have names, just identification numbers

61
New cards

example of data confidentiality

bank employing encryption methods to protect data and implement strict access controls

62
New cards

Facebook Cambridge Analytica Scandal

data acquisition through a third party app called "This is your Digital Life"

63
New cards

Facebook Cambridge Analytica Scandal - Data harvesting from friends

the real ethical data breach occurred when the app accessed personal data like their facebook friends

64
New cards

Facebook Cambridge Analytica Scandal - Scope of data collected

profile details, likes, friends' lists, users' psychological profiles and preferences

65
New cards

Facebook Cambridge Analytica Scandal - Purpose of data collected

used by cambridge analytica (political consulting firm) to target political ads and to influence voting behavior

66
New cards

First AI Beauty Contest in 2016

use AI to judge and select winners based on the contestants' submitted photos

- included only one certain skin tone although many people of color submitted photos

67
New cards

First AI Beauty Contest in 2016 - Violation of Fairness

exhibited racial bias by disproportionately selecting winners who were unrepresentative of global population

68
New cards

First AI Beauty Contest in 2016 - Underrepresentation

lack of diversity leads the questions of whether the AI system was genuinely inclusive and fair in its assessment

69
New cards

First AI Beauty Contest in 2016 - Root causes of bias

data bias and algorithmic bias

70
New cards

data bias

lack of diversity in the training data for AI

71
New cards

algorithmic bias

if training data lacks diversity, the algorithm may not have learned to recognize and appreciate a broad range of beauty standards

72
New cards

concerns with TikTok

owned by a foreign company (ByteDance)

questions have been raised about where TikTok stores user data, especially given the foreign country ownership of the company

- potential for data access by the foreign government

- how does it select and prioritize recommended content

73
New cards

Facebook and mental health services

crisis intervention bot

74
New cards

crisis intervention bots

engage with uses who express thoughts of self-hard or suicide and provide immediate support

75
New cards

the issue with Facebook and mental health services

visitors of websites are often forced to have their information collected when it is said to be anonymous

Meta Pixel allows facebook users to call the crisis lines, but it seems their data was being sent to facebook and was not anonymous anymore (pixel-based data could be unscrambled easily to reveal true identities

76
New cards

Quest Diagnostics Breach 2019

significant cybersecurity incident where an unauthorized user gained access to the systems of a third-party billing collections vendor used by Quest and exposed personal and medical information of 12 million patients

77
New cards

scope of the Quest Diagnostics Breach 2019

one of the largest healthcare data breaches at the time

78
New cards

data exposed in Quest Diagnostics Breach 2019

names, addresses, phone #s, DOB, SSN, lab results

79
New cards

Quest Diagnostics Breach 2019 and its relation to complex ownership

highlights the challenges in determining who ultimately owns and is responsible for protecting patient data when so many parties are involved

80
New cards

BBC News: Facial recognition fails on race

NIST found that facial recognition software exhibited lower accuracy in identifying faces of African American and Asian Americans compared to Caucasians

particularly pronounced in one-to-one matches

81
New cards

root cause of the bias reason for BBC News scandal

the facial recognition software was primarily trained on databases from govt. agencies (State dept., FBI) consiting of primarily white individuals

therefore the lack of diversity in the training data resulted in the algorithm inheriting biases

82
New cards

lesson learned fro BBC News scandal

critical need for representative datasets to address algorithmic bias

83
New cards

Question Diagnostics data breach in 2019: what to consider

clear data ownership agreements

data access controls

data encryption

data minimization

security audits and assessments

84
New cards

sources of data

primary and secondary

85
New cards

primary data

no related research is done on the subject/topic

collecting brand new data

86
New cards

secondary data

data regarding a specific topic is readily available or collected by someone else

87
New cards

before collecting data, need to consider

the question you aim to answer

the data subjects you need to collect data from

the collection timeframe

data collection methods best suited to your needs

88
New cards

types of quantitative data

raw #s and digits

ration

internal

89
New cards

types of qualitative data

customer reviews or feedback

nominal and ordinal

90
New cards

how can qualitative data be collected?

answering questions (how much, how often)

one-on-one interviews, observations, focus group meetings, surveys

allows mathematical analysis

91
New cards

categorical data

grouped based on the categories

92
New cards

three classification types for categorical data

binary

nominal

ordinal

93
New cards

binary

only take two possible states (true/false)(yes/no)

94
New cards

nominal

labeled data classified into various groups with no ranks or order between them

(country, gender, haircolor)

95
New cards

ordinal

groups based on order or ranking

(economic status)

96
New cards

first-party data

collected directly from users by your organization

(most valuable because you receive information about how your audience behaves, thinks, feels - all from a trusted source)

97
New cards

second-party data

data shared by another organization about its customers (or its first-party data

98
New cards

third-party data

data that has been aggregated and rented or sold by orgs. that do not have a connection to your company or users

99
New cards

data collection methods

online survey

paper survey

interview and focus groups

forms

polls

voting

social media monitoring

online tracking

100
New cards

interviews

one-on-one conversations