1/18
Flashcards covering key vocabulary and concepts related to Big Data, data usability, analysis methods, and data storage.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Big Data
Extremely large sets of data that cannot be easily managed or analyzed using traditional tools.
Usable Data
Data that is clean, well-organized, and accessible, easily understood and processed.
Useful Data
Data that is relevant, timely, and appropriate for solving a specific problem or answering a question.
Structured Data
Organized data formatted into rows and columns, such as spreadsheets and databases.
Unstructured Data
Data that has no predefined format, like emails, images, and social media posts.
Data Extraction
The process of pulling specific, meaningful information from raw or unstructured data.
Metadata
Data about data; information that helps in organizing, finding, and understanding stored data.
Data Persistence
The ability of data to be saved and retained over time, even after a program or device is shut off.
PII
Personally Identifiable Information; data that can identify a person, such as name or social security number.
Descriptive Analysis
Analysis that summarizes what happened, with high confidence but low future utility.
Predictive Analysis
Estimates what might happen, with moderate confidence and medium utility.
Prescriptive Analysis
Suggests actions based on data, with lower confidence but high decision-making utility.
Classification
A data mining strategy that assigns data to predefined categories.
Clustering
A data mining strategy that groups similar data points based on features.
Regression
A predictive analysis method that predicts continuous values based on trends.
Model
A simplified representation of a system used to understand complex systems and predict outcomes.
Simulation
Dynamic models that allow testing in virtual environments safely and cost-effectively.
Web Scraping
Method of automatically extracting data from the visual content of websites.
Screen Scraping
A process that captures data from the display output of a computer program.