CD

Data Science notes

Numpy= arithmetic, assumes grid, either all numbers or words 

Numplot= to make charts like in sheets 

Pandas= panel data, data structures that look like sheets, can be a different data type unlike numpy 

series = one column of data 

row= represents one individual in a test

  •   import numpy as np, 

  • Import matplotlib as mpl

    • import matplot.pyplot as plt,

    •  import pandas as PD

  • Import data, using collab 

  • Create your new df and label it 

  • First five = head

  • LAST rows = tails 

  • Use columns to specify what you want, you can do number by name 

  • Can sort a column into what bins you need. 

  • Conditionals are mask 

Ethics CH. 3&4

Data is valuable but needs to be collected responsibly.

“The New Oil”

Golden rule: “Treat other’s data as you would have others treat your data”

Five C’s

  1. Consent

an agreement between a service user (the people who collect the Data) and the user consent often binary and sold without consent

Ex: asking for the user’s consent.

  1. Clarity

connected with consent, need to be told clearly what you are consenting to.

Not everyone clearly understands what type of data can be sold.

Ex: Informing users before they consent about what they are consenting to.

  1. Consistency/ trust:

Trust requires consistency over time.

Ex: Facebook's lack of consistent enforcement of use agreements.

  1. Control/ Transparency:

Lb riiamc you to track what happens to your data. What amount of control do you have?

Ex: Europe’s General Data Regulation, requires data users to be provided to them at their request.

  1. Consequences:

Laws and policies have been put into law to protect people on the internet.

Ex: The COPPA ( which protects children online)

What is missing from ethics conversations?

How some companies will charge for services like data removal when that should be an inherited right, done easily by the Consumer.

Signal and noise

The printing press kicked off the spread of new ideas and information. The spread of information led to the scientific revolution and the Protestant Reformation.

The Information Age started in the late 1970’s.

Big Data generates 2.5 quintillion bytes of data each day.

Numbers can’t speak for themselves, we are the ones who sign the meaning of the numbers.

Data-driven prediction can succeed

Data science can be used in a plethora of fields.

We are quick to judge data and in that judgment really on it to solve the problems we create.

Humans tend to generalize problems, issues, or solutions.

“ finding patterns in random noise”

What is prediction in the context do data science? = People use the data they find as justification for what they think will happen.

How is it useful and under what scenarios is it challenging?= Data can help our brains to compare and quantify things about the world around us, however, it’s can be challenging when trying to find nuanced information based on simplicity data


A scatterplot with speed on the x and hp on the y and one color set to pokemon who are legendary and another color for those who are not labeled legendary.