DSCI 2013 – Data Science Literacy – Midterm Review Sheet

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/61

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

62 Terms

1
New cards

Define Data Science(be familiar with the contextual model diagram)

Data Science is the field that advances methods to improve the use of data for human progress.

2
New cards

Explain how a social networking site illustrates the various aspects of our definition of data science.

1. Data Objects representing you, your friends, and your connections mathematically.

2. Report and Dashboards giving insight about friendship extracted from data.

3a. AI system for Recommending New friends

3b. New insights, Measurable performance(Time spent, likes, taps, clicks, ad revenue)

3
New cards

Define digital transformation

The adoption of digital technology to replace manual processes with digital processes. (Digitalization)

4
New cards

What three factors make "Big Data" so important now?

• Massive amounts of data about many aspects of human life

• Abundance of inexpensive computing power

• Competitive advantage when data are actually used

5
New cards

How do companies use data science to gain a competitive advantage?

-Business planning
-Performance tracking
-Process automation
-Market research

6
New cards

What Knowledge, Skills, and Abilities are involved in Data Science teamwork?

• Cultural Understanding
• Curiosity and Understanding
• Subject-Matter Expertise
• Mathematics / Statistics / Analytics
• Data Wrangling - parsing, scraping, formatting data
• Visualization
• Programming Novel Computing Tools
• Use of Existing Computing Tools
• Logic / Wisdom / Expertise
• Leadership and Communication
• General Flow: Problem -> Data Science -> Value

7
New cards

What are the characteristics of a data-science-literate professional?

• Ability to know when, how, and in what ways data science teams could benefit the problem at hand.
• Ability to anticipate what might be relevant for a data science team to know about a subject-matter domain about which the data scientist may be partially or wholly ignorant
.• Ability to articulate what data are available in one's field
• Ability to anticipate data that could be collected or created for future analysis by data scientists

8
New cards

Define data

Pieces of information that have been translated into a form that is more efficient for storage, movement, or processing.

9
New cards

Why is context important for understanding the meaning of data?

The data do not speak for themselves and require theory from a particular subject matter area to provide the context for understanding data.

Sociology of knowledge - study of the relationship between human thought and the social context within which it arises, and of the effects that prevailing ideas have on societies

10
New cards

Define datafication

Taking all aspects of life and turning them into data.

Once we datafy things, we can transform their purpose and turn the information into new forms of value.

11
New cards

What are examples of data produced by humans?

1. Information / Measurements Gathered as Part of an Experimental Research Design
2. Data Collected as Part of Case Management
3. Digital Communications or Actions
4. Social Media and Chat
5. Mass Communication
6. Polling and Surveys

12
New cards

How often is the US Census performed?

The US Census is performed every 10 Years.

13
New cards

What kinds of sensors produce data?

1. Cameras and Microphones -> Photos, Video, Sound Files
2. Internet of Things
3. Modern Manufacturing and Industrial Production

14
New cards

What kind of data would help us understand why a computer system might be malfunctioning?

1. Application Logs
2. Operating System Logs
3. Access Logs

15
New cards

What is a programming language?

A programming language is a formal language (as opposed to an informal language like human languages). Its form has been determined to give the programmer the ability to implement sets of instructions that work together to achieve a specific output.

16
New cards

What are two key programming languages used for Data Science work?

• R
• Python

17
New cards

When would a data scientist use GIS?

Useful for any field where location is important (Energy, Real Estate, Military, City Planning, Politics, etc.)

18
New cards

Give three examples of Python libraries commonly used for machine learning

TensorFlow, Keras, and PyTorch

19
New cards

What is the most important Data Science Tool?

Your brain

20
New cards

Define Privacy

Rights of people to control how information about them is collected, used and disclosed

21
New cards

What right does the fourth amendment to the US Constitution protect?

The right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures, shall not be violated, and no Warrants shall issue, but upon probable cause, supported by Oath or affirmation, and particularly describing the place to be searched, and the persons or things to be seized.

22
New cards

What key principles did the 1970 Fair Credit Reporting Act establish?

• There should be no secret collections of data that are used to make decisions about a person's financial life / credit
• Individuals should have a right to examine and challenge the accuracy of information held in such collections
• Information in such a collection should expire after a reasonable amount of time

23
New cards

Which US law made hacking illegal?

1986 Computer Fraud and Abuse Act

24
New cards

What kind of data does HIPAA protect?

Regulate collection, use and disclosure of medical information by health care providers or those who come into contact with medical records.

25
New cards

What kind of data does FERPA protect?

The Privacy of student records

26
New cards

What law amended nearly every previous privacy law?

The Patriot Act

27
New cards

The GDPR governs data privacy in what political region?

The European Union

28
New cards

What is ethics?

Ethics is the study of moral principles - the science of right vs. wrong

29
New cards

What is the end goal of ethics?

The end goal of ethics is to establish justice.

30
New cards

What does "informed consent" mean for data collection?

"Is the subject aware of how the data collected about them will be stored and used?"

31
New cards

Why is ensuring data security an ethical obligation?

To protect individuals from potential harm, such as identity theft or privacy breaches.

32
New cards

What is an algorithm?

A process or set of rules to be followed in calculations or other problem-solving operations, especially by a computer.

33
New cards

How could facial recognition be unjust

• Algorithms boast 90% accuracy. This is uneven.
• Poorest accuracy is for subjects who are female, Black, and 18-30 years old

34
New cards

What are the three types of algorithmic bias?

• Pre-Existing
• Technical
• Emergent

35
New cards

How might it be unethical NOT to use data?

• Like any technology, there is a process of adoption and diffusion, unsettling older technologies, forcing us to learn to use new technologies wisely, etc.

• Data Science, at its best, will help us to move beyond human action not informed by reality. At its worst, it can reproduce existing biases at a larger scale.

36
New cards

For Research with human subjects in the university(or any place of research study) to be conducted, whom must it be approved by first?

The IRB (institutional review board)

37
New cards

What are some of the safeguards that keep a Google data center secure?

-Security Operations Center
-Vehicle Crash Barriers
-Overlapping Cameras
-Biometric access centers
-Thermal camera detection
-Redundant Power Systems
-Motion Detecting Fences
-Data encryption
-ID scanners
-Disk and Hard Drive Grinder
-Ethical hackers

38
New cards

What is Data Governance and Management?

A collection of administrative processes that affect acquisition, validation, storage, protection, and processing of required data to ensure the quality, accessibility, reliability, and timeliness of the data so that stake holders can use the data to achieve organizational goals effectively.

39
New cards

To what does good Data Management or Governance lead?

• Data Quality
• Privacy
• User, Customer, or Constituent Trust
• Compliance with Regulatory Requirements
• Effective Use of Data to meet Strategic Goals(Accurate Reporting, Analytics, Dashboards, Enables better decision-making)
• Good Management

40
New cards

What is Data Security?

The condition of data measured in terms of factors like:
• Relevance
• Timeliness and Recency
• Accuracy
• Consistency - a single source of truth
• Completeness
• Reliability
• Reduced Risk - i.e., Safe to Use for Intended Purpose
• Traceability / Lineage

41
New cards

How did the Yahoo Data Breach of 2014 affect Yahoo?

Attack by state-sponsored hackers compromised the real names, email addresses, date of birth, and telephone numbers of 500 Million Users

42
New cards

What are the five parts (core functions) of the NIST Cybersecurity Framework?

• Identify
• Protect
• Detect
• Respond
• Recover

43
New cards

Be able to describe the five core functions of NIST

• Identify - the "what" of your organization and its data and systems.
• Protect - take steps to secure your organization's data and systems.
• Detect - watch for and be able to know when your data and systems are threatened.
• Respond - create and carry out procedures to take action when a threat has been detected
• Recover - develop and carry out procedures to restore business activities, eliminate the exploited vulnerability, and come back stronger.

44
New cards

What is encryption?

A way to scramble data so that only an authorized party can unscramble it.

45
New cards

What is the difference between system access and system entitlements?

• System Access - Whether a particular person may use a system.

• System Entitlements - What actions can a user with access to the system take?

46
New cards

What is a zero-day vulnerability?

A true flaw in software or systems sometimes kept private by hackers. Discovered before there is a patch available

47
New cards

What are some ways to establish physical security?

• Locks
• Keys (Physical Keys, Codes, Biometrics, Multi-Factor)
• Separation
• Access Records / Logs
• Prevention of Line of Sight / EarshotExploits
• Shredding / Destruction

48
New cards

What is a Data Entity?

Person, Place, or Thing that you want to represent as a data object and track in a database, system of files, etc.

49
New cards

Each entity has ____?

"Attributes" (properties or traits)

50
New cards

What do we use ERD diagrams to do? What does ERD stand for?

We often use ERD (Entity Relationship Diagrams) diagrams to explain or document a data model.

51
New cards

What is cardinality?

The number of items in a set, minimums and maximums.

52
New cards

What are the three types of data models?

conceptual, logical, physical

53
New cards

What is normalization?

The attempt to store a piece of information only once, link related information through "keys"

54
New cards

What is SQL?

Structured Query Language
Purpose:
- Great for data-gathering software applications
-Case management systems (e.g., Banner at MSU)
-Data warehousing

55
New cards

What is an unstructured database?

An unstructured database is information not stored in a specific format. Sometimes referred to as "noSQL". Information that is complex and cannot be reduced to a small number of fields

56
New cards

Why might a flat data structure be good?

• Often preferred for analysis / analytics
• Sometimes preferred to speed up searches (indexing)
• Used for "batch" exchanges of data (exporting and importing)
• Used for reports

57
New cards

Vector databases are often used in ______?

AI applications

58
New cards

Why would a graph database be useful?

• When one might need to retrieve and traverse complicated relationships between data objects quickly.
• If Faster retrieval of complicated relationships is required
• If a different language other than SQL is required

59
New cards

What are the pros of the Block Chain?

• Solves "Trust" Problem
• No Intermediaries, Better than Third-Party Solutions(saves time and money)
• Is a nondestructive way to track data changes over time

60
New cards

What are the cons of the Block Chain?

Brings with it complex policy questions around:
• Governance
• Economics
• International law
• Security

61
New cards

Define Confidentiality

Obligation of those who have access to private information not to disclose private information to others.

62
New cards

Define Security

Technological, physical, or administrative safeguards or tools designed to protect data from unwarranted access or disclosure.