1/9
These flashcards cover key concepts from Chapter Two of the lecture on Data Science, addressing definitions, characteristics, and distinctions related to data and its processing.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
What is Data Science?
A multi-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from structured, semi-structured, and unstructured data.
What types of data representations are there?
Data can be represented as structured, semi-structured, or unstructured.
What is the Data Processing Cycle?
The set of operations used to transform data into useful information, including data collection, input, processing, output, and storage.
What defines structured data?
Data that adheres to a pre-defined data model and is straightforward to analyze, typically in a tabular format like Excel or SQL databases.
What characterizes unstructured data?
Data that does not have a predefined data model; typically text-heavy and may include audio, video files, and requires more complex processing methods.
What is Big Data?
Big data refers to large and complex datasets that are difficult to process using traditional data management tools and applications.
What are the four key characteristics of Hadoop?
Hadoop is economical, reliable, scalable, and flexible.
What is the importance of data trustworthiness in Big Data?
Data trustworthiness refers to the degree to which Big Data can be trusted, impacting its reliability for decision-making.
What are some application domains of Data Science?
Healthcare, marketing, finance, manufacturing, and social media are examples of application domains for data science.
What is the goal of the Big Data lifecycle?
To surface insights and connections from large volumes of heterogeneous data that are not achievable with conventional methods.