1/15
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Differentiate between what data shows and why that might be the case.
What -> Fact
Why -> Opinion
Explain the usefulness of metadata.
Metadata are data about data:
-It can be changed without impacting the primary data.
-Used for finding, organizing, and managing information
-Increases effective use of data by providing extra information
-Allows data to be structured and organized
What is the data analysis process?
Choose or Collect Data -> Clean and/or Filter -> Visualize and Find Patterns -> New Information
How do data visualizations help us?
-Look at lots of data at once
-See patterns that are "invisible" if you just look
at the table
Explain the reasons that someone would create either a bar chart or a histogram in order to explore a single column of data.
Bar Chart: Count how many times each value in the column appears and make a bar at that height.
What value(s) are most common in this column?
What value(s) are least common in this column?
What is the unique list of values in this column?
Histogram: Similar to a bar chart, but first all numbers in a range or "bucket" are grouped together.
What range of value(s) are most common in this column?
What range value(s) are least common in this column?
What ranges of values do or do not appear in this column?
How do programs help us with data?
Programs (like the Data Visualizer) can help process data so we can understand it and learn.
How do charts help with data? What are two types of charts?
Charts and other visualizations can help both find and communicate what we've learned from data.
Bar charts and histograms are two common chart types for exploring one column of data in a table.
Explain why data needs to be cleaned.
-Data is incomplete
-Data is invalid
-Multiple tables are combined into one
What does filtering data allow a person to do?
Filtering data allows the user to look at a subset of the
data.
Explain the reasons that someone would create either a crosstab and
scatter chart in order to explore two columns of data.
Bar charts and histograms are only useful for looking at one column of data. If we want to look at relationships between two pieces of information we'll need ways to visualize data that look at two columns of data at the same time.
How can we develop knowledge form our world?
We can develop insights and knowledge about our world from manipulating and visualizing data, in particular by finding patterns.
What can we see when observing data? Can we know the cause of the correlation?
When investigating two columns of data, we can observe patterns of different values are correlated. We cannot know for certain the cause of the correlation.
Define and explain the impacts of crowdsourcing, crowdfunding, and
citizen science.
Crowdsourcing is the practice of obtaining input or information from a large number of people via the Internet.
-offers new models for collaboration, such as connecting businesses or social causes with funding
Citizen science is research where some of the data collection is done by members of the public using own computing devices which leads to solving scientific problems.
Explain why in some contexts large amounts of data need to be analyzed in parallel and scalable systems.
When data gets too big, it can no longer be processed on one computer. Cloud computing or parallel systems are sometimes used to help process all that information.
In general, scalability of your system is important to consider when working with big data. You want your system to be able to work even as you're using more and more data.
Explain the impact of open data on scientific research and discovery.
Open data is publicly available data shared by governments, organizations, and others. Making data open helps spread useful knowledge or creates opportunities for others to use it to solve problems.
Recognize and explain potential bias in a dataset or interpretation.
Humans are biased one certain things, whether they mean it or not. When handling a system that is entirely open to new information, including biased information, it is only natural that there would be some sort of bias on the part of the computer, since that is what it learned. This biased information can lead to biased answers, which can either help or harm a person or his learning.