1/26
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Data Analysis Process
A systematic approach to analyzing data, consisting of steps to collect, clean, visualize, and generate information.
Step 1: Collect or Choose Data
Gather the data needed for analysis.
Step 2: Clean/Filter
Remove errors or inconsistencies from the data. Focus on relevant data for analysis.
Step 3: Visualize and Find Patterns
Use graphs/charts to observe data for patterns.
Step 4: Generate New Information
Produce results based on observations.
Data
Information collected for analysis.
Metadata
Data about data including time of data collection, type of data, location of data collection, method of collection, and collector of the data.
Bar Charts
Visualizations that can be vertical or horizontal, showing frequency analysis where taller/longer bars indicate more frequent values.
Insights from Bar Charts
Identify most and least common values, range, and presence of values.
Pie Charts
Visualizations that represent percentages of unique values in a dataset.
Insights from Pie Charts
Identify highest/lowest percentages and compare values.
Histograms
Displays frequency of values within ranges and reads similarly to bar charts.
Insights from Histograms
Identify most and least common ranges.
Scatterplots
Compares two data columns to find relationships, which can be direct, inverse, or none.
Insights from Scatterplots
Identify relationships and trends; make predictions.
Correlation
Indicates similarities and apparent patterns between data sets.
Causation
Implies one event causes another.
Important Note on Correlation and Causation
Correlation does not equal causation.
Big Data
Data gathered through data mining and web scraping, solving problems like efficiency in business and disease identification.
Open Data
Freely available data with minimal restrictions, sourced from open data repositories.
Crowdsourced Data
Data collected by ordinary people for decision-making.
Machine Learning
Involves algorithms that analyze data and adapt, used in daily tasks and AI.
Limitations and Bias in Machine Learning
Algorithms may reflect human biases if the input data is not diverse.
Example of Bias
Twitter's cropping algorithm favored certain demographics due to biased training data.
Ways to Mitigate Bias
Diversify training data by including underrepresented groups.
Simulation
A model of real-world situations/events useful for hypothesis testing when real experimentation is impractical or risky.
Usage of Simulations
Help abstract complex processes and provide insights that cannot be easily realized in real life.