1/37
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Data Science
A concept to unify statistics, data analysis, machine learning, and their related methods to understand and analyze actual phenomena with data.; Data science is the discipline of making data useful
Statistics
The science of changing your mind under uncertainty.
Machine Learning
Making labels using examples instead of explicit instructions.
Data Mining/Analytics
The process of finding data to make decisions, including descriptive analytics and exploratory data analysis.
Data Scientist
A person skilled in statistics and programming, possessing knowledge in data collection and application.
Categorical Variables
Variables that represent groups or categories, e.g., color.
Quantitative Variables
Variables that represent amounts or quantities, e.g., age, height.
Data Preparation
The process of converting raw data into a usable format for analysis.
Labeling
The process of ascribing meaning/categories to data, essential for model training.
Annotation
Adding explanatory notes to data, typically done by external or internal teams.
Human in the Loop
A system supervised by humans to improve data collection and application.
Brute Force Collection
A method of data acquisition focused solely on collecting extensive amounts of data.
Datafiction
The belief that data collection started with the internet and computers; suggests data science involves more than just statistics.
Confounding Variables
Variables that can affect the outcome of an analysis but are not directly measured. that can be indirectly measured
Bias
Systematic errors that can skew the results of data analysis, including selection and sampling ___.
Histogram
A graphical representation showing the distribution of one quantitative variable.
Bar Chart
A chart representing the frequency of discrete categories.
Scatter Plot
A plot used to show the relationship between two quantitative variables.
Pie Chart
A circular chart that shows the relative frequency of discrete categories.
Data Ethics (Five C's)
Consent, Clarity, Consistency, Trust, Control, and Consequences.
The formula for the total number of hospitals who list a hip knee cost:Â
=count (N:N)
If your histogram looks like this, your problem is
you have too many bins/ inputs
Fill in the blank: ___ does not Imply___
Correlation and causation
Know as the father of epidemiology, founder of the cholera experiment
John snow
A study wich neither experiment designers nor participants know who receives treatment
Double blind.
Three domains
Computer science, domain experience and statistics.
The formula for the average cost of better heart failure treatment:
= Averageif(k:K,”better”,J:J)
A type of data collection where the sole purpose of an activity is data acquisition
Brute force
The formula for the number of hospitals with between heart attacks and better pneumonia quality.
=Countifs(1:1,”better”,M:M,”better”)
This technique lets us use one variable in order to establish casually and avoid interference from other variable
Randomization
What is plotted on the vertical axis of a histogram
Frequency of points, or counts
The formula for the average cost of a heart attack
=average(h:h)
The method of data aquation used by boston dynamics robotic company to train their robots:
Huaman in loop
Public Data
Data available on the internet, Largest source of data
Ex. Using Google Street View to classify cars, then interfere voting habits
Data Preparation
 Filtering out impurities, Labeling, Annotating Data, Getting users to generate labels, Tools to speed up annotation
Filtering out impurities
Sorted manually , or automated tools to separate but not discard
Labeling
The most time-consuming part, Allows you to ascribe meaning/ categories to the data, what you want your algorithm to support in the wild.
Annotating Data
External annotation service providers, Internal annotation team