1/18
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
What is the core definition of Data Mining?
The discovery of patterns in large datasets.
What is the goal of Data Mining?
To turn data (raw facts) into information (contextualized facts) and finally into knowledge (actionable insights).
Why can't data be mined by hand?
Because the datasets are too large, change too fast, or are too complex for humans.
What is the difference between correlation and causation in data mining?
Data mining finds correlations (A and B happen at the same time) but does NOT prove causation (A caused B).
Give an example of a correlation found through data mining.
A computer finds that people who buy diapers also buy beer.
What is anomaly detection?
Identifying data points that don't fit the pattern, used for fraud detection.
What is classification in data mining?
Grouping items into categories, such as classifying an email as 'Spam' or 'Inbox'.
What is metadata?
Data about data, describing the properties of a file or message without showing the content.
Provide an example of metadata for a digital photo.
Date taken, GPS coordinates, camera model, file size.
What is re-identification in data mining?
The process of taking 'anonymous' data and combining it with public datasets to identify individuals.
What is algorithmic bias?
Bias in results due to biased data used to train the mining tool, reflecting the priorities of the humans who created it.
What is the privacy trade-off in data mining?
The trade between privacy and convenience, such as giving location data for faster navigation.
What is the significance of data cleaning before mining?
Fixing errors in the data to ensure accurate mining results.
What does scalability refer to in data mining?
The ability to handle huge data efficiently.
What does metadata reveal about a file?
Information such as time/date, sender's phone number, or browser type.
What is the relationship between combining datasets and re-identification?
Combining datasets can lead to re-identification of individuals from anonymous data.
What is the main challenge with anonymous datasets?
No dataset is truly 100% anonymous if it can be cross-referenced with other data.
What is the role of computational tools in data mining?
They are necessary to analyze large and complex datasets that cannot be handled manually.
What is the impact of computing on privacy and ethics in data mining?
Computing impacts privacy through data collection and can introduce ethical concerns regarding bias and re-identification.