Numpy= arithmetic, assumes grid, either all numbers or words
Numplot= to make charts like in sheets
Pandas= panel data, data structures that look like sheets, can be a different data type unlike numpy
series = one column of data
row= represents one individual in a test
import numpy as np,
Import matplotlib as mpl
import matplot.pyplot as plt,
import pandas as PD
Import data, using collab
Create your new df and label it
First five = head
LAST rows = tails
Use columns to specify what you want, you can do number by name
Can sort a column into what bins you need.
Conditionals are mask
Ethics CH. 3&4
Data is valuable but needs to be collected responsibly.
“The New Oil”
Golden rule: “Treat other’s data as you would have others treat your data”
Five C’s
Consent
an agreement between a service user (the people who collect the Data) and the user consent often binary and sold without consent
Ex: asking for the user’s consent.
Clarity
connected with consent, need to be told clearly what you are consenting to.
Not everyone clearly understands what type of data can be sold.
Ex: Informing users before they consent about what they are consenting to.
Consistency/ trust:
Trust requires consistency over time.
Ex: Facebook's lack of consistent enforcement of use agreements.
Control/ Transparency:
Lb riiamc you to track what happens to your data. What amount of control do you have?
Ex: Europe’s General Data Regulation, requires data users to be provided to them at their request.
Consequences:
Laws and policies have been put into law to protect people on the internet.
Ex: The COPPA ( which protects children online)
What is missing from ethics conversations?
How some companies will charge for services like data removal when that should be an inherited right, done easily by the Consumer.
Signal and noise
The printing press kicked off the spread of new ideas and information. The spread of information led to the scientific revolution and the Protestant Reformation.
The Information Age started in the late 1970’s.
Big Data generates 2.5 quintillion bytes of data each day.
Numbers can’t speak for themselves, we are the ones who sign the meaning of the numbers.
Data-driven prediction can succeed
Data science can be used in a plethora of fields.
We are quick to judge data and in that judgment really on it to solve the problems we create.
Humans tend to generalize problems, issues, or solutions.
“ finding patterns in random noise”
What is prediction in the context do data science? = People use the data they find as justification for what they think will happen.
How is it useful and under what scenarios is it challenging?= Data can help our brains to compare and quantify things about the world around us, however, it’s can be challenging when trying to find nuanced information based on simplicity data
A scatterplot with speed on the x and hp on the y and one color set to pokemon who are legendary and another color for those who are not labeled legendary.