1/76
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Small Data
⢠Ready for analysis, flat file, no need to merge (many) tables
⢠Located in database or local PC
⢠Spreadsheet or viewed on few sheets of paper
⢠Impact decisions in the present(Transactional databases)
Big Data
⢠Large chunks of (usually) unstructured data
⢠Cloud, Offshore, SQL Server, etc
⢠Over 50k variables, over 50k individuals, random samples, unstructured
⢠Purpose is data mining
The analysis part makes it important and aids decision making
What is the main difference between big and small data?
Small Data: You can easily understand what you are looking at and use it to make decisions without any analysis
Big Data: Usually unstructured data that doesn't make sense without further analysis
Structured Data
Stored in a traditional system such as a relational database or spreadsheet
Ex. employee tableā spreadsheet
Simple to make sense of
unstructured data
Different data types
Not laid out in way that is easily read by humans
semi-structured data
data that has some organization but is not fully organized to be inserted into a relational database
Some structure still not as easy to read
Five V's of Big Data
Volume: amount of data there
Velocity: speed at which accumulate data
Variety: different data types(videos, coordinates...)
Veracity: integrity of the data
Value: Orgs have seen this as the main reason to use big data
Volume
-Storage has become more readily available
-Aided by advancements in computing power
ex.
⢠Walmart handles over 1 million transactions per hour.
⢠That equates to over 40 Petabytes of data per day.
⢠Billions of rows every day.
Velocity
How fast is the data?
We are reporting more data much more quickly
Examples:
⢠Clickstreams and ads
-Stock trading algorithms
⢠Machine to machine processes
⢠Infrastructure and sensors
⢠Online gaming systems
Variety
What's in the Data?
-Data is now less predictable, less structured
-No longer just numbers, dates, strings.
Veracity
Is the data any good?
Quality, authenticity, and validity of the data
Value
What's the point of the data?
⢠What an organization gains from retaining or acquiring the data.
Critical to strategic objectives
67% of organizations have started to use big data
Only 6% consider themselves mature in usage
#1 reason =Finding correlation across multiple data sources
Basic Steps of Big Data Solution
1. Data Ingestion
2. Data Storage
3. Data Processing
what do we do once we have all this Data?
⢠Analysis
⢠Making decisions
(Targeting people with ads)
⢠Algorithms
What's an algorithm?
a process or set of rules to be followed in calculations or other problem-solving operations, especially by a computer
ex. following a recipe can be an algorithm
3 pillars of security
1.confidentiality
2. integrity
3. availability
SEEP framework
ā¢Security: confidentiality, integrity, availability
ā¢Economics: Where orgs find value in big data
Predict consumer behavior
ā¢Ethics: Ethics & Biases involved in decision making process
ā¢Privacy: Focused on consumers preference
Security vs Privacy
Security - business focused
Privacy-preference of the individual
Security can help enforce privacy policies OR they can be at odds with each other
Ex. Airport- adding security by removing privacy
What is Louis Brandeis' definition of privacy?
"right to be left alone"
- protection from institutional threat: government, press
What is Alan Westin's definition of privacy?
"an individual's right "to control, edit, manage, and delete information about them[selves] and decide when, how, and to what extent information is communicated to others."
Privacy
what information goes where?
Individual's preference
Security
protection against unauthorized access
Can help enforce the privacy policies we want to implement
personally identifiable information (PII)
Sensitive Information
the name, postal address, or any other information that allows tracking down the specific person who owns a device
Fair Information Practices (FIP)
8 Principles that govern the collection and use of information about individuals.
1.Collection Limitation
2.Data Quality
3.Purpose Specification
4.Use Limitation
5. Security Safeguards
6.Openness
7 . Individual Participation
8. Accountability
Collection Limitation
The collection of personal information should be limited, should be obtained by lawful and fair means, and, where appropriate, with the knowledge or consent of the individual
Data Quality
Personal data should be relevant to the purposes for which they are to be used, and, to the extent necessary for those purposes, should be accurate, complete and kept up-to-date.
Purpose Specification
The purposes for which personal data are collected should be specified not later than at the time of data collection and the subsequent use limited to the fulfillment of those purposes or such others as are not incompatible with those purposes and as are specified on each occasion of change of purpose.
Use Limitation
Personal data should not be disclosed, made available or otherwise used for purposes other than those specified in accordance with Paragraph 9
except: a) with the consent of the data subject; or b) by the authority of law
Security Safeguards
Personal data should be protected by reasonable security safeguards against such risks as loss or unauthorized access, destruction, use, modification or disclosure of data.
keep it from getting in the wrong hands
Openness
There should be a general policy of openness about developments, practices and policies with respect to personal data. Means should be readily available for establishing the existence and nature of personal data, and the main purposes of their use, as well as the identity and usual residence of the data controller.
transparency
Individual Participation
An individual should have the right:
a) to obtain from a data controller, or otherwise, confirmation of whether or not the data controller has data relating to him;
b) to have communicated to him, data relating to him within a reasonable time; at a charge, if any, that is not excessive; in a reasonable manner; and in a form that is readily intelligible to him;
c) to be given reasons if a request made under subparagraphs(a) and (b) is denied, and to be able to challenge such denial; and
d) to challenge data relating to him and, if the challenge is successful to have the data erased, rectified, completed or amended.
The right to review data we have on individual
Requesting modification/ challenge data
Accountability
A data controller should be accountable for complying with measures which give effect to the principles stated above
IP Address (Internet Protocol Address)
A unique number identifying every computer on the Internet (like 197.123.22.240)
-Visible to site you are visiting
1st party cookie
1st party is the website you are using
ex. amazon
Amazon stores a file based on your visit like your shopping cart
3rd party cookies
can track what you're doing across sites and tracking your behavior
Usually advertisers
Used to save preferences , shopping cart, etc.
Can track you even if IP changes
online social networks Pros and Cons
Pros:
Simplifies data analysis
-You're telling those sites who you are
Cons:
Single point of attack
Site breachā can get a lot of info at once
No longer control access to own data
proxy server
A server that acts as an intermediary between a user and the Internet.
VPN (Virtual Private Network)
Encrypted connection over the Internet between a computer or remote network and a private network.
Encryption
Encoding messages in a way that only authorized parties can read it
Converts original information, called plain text, into a difficult to interpret form called ciphertext (unreadable text)
Symmetric Encryption
the same key is used to encode and decode
asymmetric encryption
a type of cryptographic based on algorithms that require two keys -- one of which is secret (or private) and one of which is public (freely known to others).
managing privacy
managing an individual's expectations with respect to appropriate uses of their data
⢠Policy and technical controls can achieve this end
Non- rivalrous data
supply is not affected by the consumption of the data
Not depleted when you consume it
Non-excludable data
can't exclude particular parties from accessing data
ex. Public data
Data Revenue Streams
1. Direct Sales
2. Data Sharing Agreements-->Collect to sell
3. Targeted Advertising
ex. Target individual based on browsing/location history
How Online Shopping Makes Suckers Out of Us
-Retailers are comparison shopping us by finding out how much we will pay
-Data is key to dynamic pricing, adjusted frequently on changes
-Now each product has Multiple price points
hacker
Formerly meant someone, an expert, who "hacked" into software or systems to create new solutions, extend what they can do, or improve them.
⢠Now defined more as an intruder.
Black hats
Malicious hackers who break into computer systems and networks without authorization or permission
gray hat hackers
Invited by organization to come test our systems to help them improve their security
Social engineering-phishing tests
white hat hackers
come in uninvited, don't expose vulnerabilities but come to company to ask to fix it
Internal Threat Types
⢠Compromised
⢠Oblivious-don't know better
⢠Negligent-know better, but do the wrong thing bc their lazy (USB)
⢠Malicious
⢠Professional-infiltrates the organization knowing that's their goal
Tools
Malicious code and malware
Virus, worm, botnet, trojan horse
Scanners and Data Acquisition
Penetration Attempts
Denial of Service
Social Engineering
Mitigation methods
1.Technical controls
2.Training (User actions)
3.Managerial and Organizational
Controls
4.Legal Mechanisms
Technical Controls
-Access Controls
( Authentication, Authorization,
Credentialing, etc.)
-Encryption(Protect data integrity)
-Anti-malware measures
Authentication vs. Authorization
Authentication-you are who you say you are
Authorization-What do you have access to
What are the two best ways to fight ransomware?
Backups and encrypting
Zero Trust
A security model based on the principle of maintaining strict access controls and not trusting anyone by default, even those already inside the network.
Technical Control
Security System Maintenance
-Patch Management
-Security by design and penetration
testing
-System Monitoring and analysis
-Backups!
Training (User actions)
Strong passwords
Change password regularly
install and keep security software up to date.
Be vigilant of security attacks
User Dilemma
no matter what security policies companies put in place users can find a way to make them obsolete
legal mechanisms
Numerous laws require security behavior by firms
⢠GLBA - Financial Data
⢠HIPAA - Health Data
⢠FERPA - Educational Records
⢠PRIVACY ACT - Government Records
⢠Many state laws across data contexts
⢠Health information, Genetic data, Financial Data, etc.
⢠GDPR (General and relates to PII)
⢠EU Data Directive (General and relates to PII)
Notice of Security Breach Act
Requires that any company maintaining personal information of California citizens that has a security breach disclose it.
privacy paradox
describes people's willingness to disclose personal information in social media channels despite expressing high levels of concern for privacy protection
Ethics vs. Morals
Ethics are specific rules and actions, or behaviors.
Morals are individual guiding principles
Facebook Emotional Contagion Study
Manipulated whether positive vs. negative posts were shown on your Facebook feed
Found evidence of emotional contagion. If your friends share something positive you feel more positive and vice versa
Unconsented study- ethical dilemma?
Accenture paper on 12 Ethics Principles
1.The highest priority is to respect the persons behind the data.
2. Account for the downstream uses of datasets
3. The consequences of utilizing data and analytical tools today are shaped by how they've been used in the past
4. Seek to match privacy and security safeguards with privacy and security expectations
5. Always follow the law, but understand that the law is often a minimum bar.
6. Be wary of collecting data just for the sake of having more data
7. Data can be a tool of both inclusion and exclusion.
8. As far as possible, explain methods for analysis and marketing to data disclosures.
9. Data scientists and practitioners should accurately represent their qualifications (and limits to the expertise), adhere to professional standards, and strive for peer accountability.
10.Design practices that incorporate transparency, configurability, accountability and auditability
11.Products and research practices should be subject to internal (and potentially external) ethical review
12.Governance practices should be robust, known to all team members and regularly reviewed.
Utilitarianism
Promoting the greater good, for greatest number of people
Consequentialist-morality of an action is only judged by its consequences
What is Bentham's idea of good?
good is pleasure
What is Mills idea of good?
good is happiness
Mulching proposal
Captured elderly and process them into a product they can utilize for food
Do the ends justify the means?
Does the good outweigh the bad?
Criticism of Utilitarianism
Hard to Quantify and measure "good"
Incommensurate notions of good
Ignores Human rights, Justice , & Distribution of good
No duty to go above and beyond
Deontology
Deon-duty
Primarily concerned with sticking to certain rules or duties
Consequences don't matter
Intention is relevant-act in a certain way for the right reason
Examples of Deontology
Divine command-divine power gives you rules
The Golden rule-treat others the way you want to be treated
Ethic reciprocity-do unto others as you would want them to do to you
(Depends more on moral agent than person being acted upon)
Natural law and natural right theory: humans possess intrinsic values that govern their reasoning and behaviors
Non-aggression theory
deontology and utilitarianism
Utilitarianism-right or wrongness depends on consequences
Deontological- right or wrongness depends on its conformity to a certain moral norms
Virtue Ethics
Perspective that what is moral comes from what a mature person with "good" moral character would deem right
Ransomware
Malicious software that encrypts a victim's data and demands a payment for decryption
Still learning (21)
You've started learning these terms. Keep it up!