1/43
Flashcards on Data Science and Big Data Fundamentals.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Data Science
Domain of study that deals with vast volumes of data using modern tools and techniques to find unseen patterns, derive meaningful information, and make business decisions.
Capture (Data Science Lifecycle)
Data Acquisition, Data Entry, Signal Reception, Data Extraction.
Maintain (Data Science Lifecycle)
Data Warehousing, Data Cleansing, Data Staging, Data Processing, Data Architecture.
Process (Data Science Lifecycle)
Data Mining, Clustering/Classification, Data Modeling, Data Summarization.
Analyze (Data Science Lifecycle)
Exploratory/Confirmatory, Predictive Analysis, Regression, Text Mining, Qualitative Analysis.
Communicate (Data Science Lifecycle)
Data Reporting, Data Visualization, Business Intelligence, Decision Making.
Big Data
Extremely large and diverse collections of structured, unstructured, and semi-structured data that continues to grow exponentially over time.
Big Data
Massive amount of organized and unstructured data that a company encounters on a daily basis, studied for insights.
Volume (Big Data)
The amount of data
Velocity (Big Data)
The rapid collection of data.
Structured Data
Data that has been arranged.
Semi-structured Data
A type of data that is semi-organized and doesn't follow the traditional data structure.
Unstructured Data
Data that has not been arranged and doesn't fit cleanly into a relational database's standard row and column structure.
Cost Savings (Big Data)
Helps in providing business intelligence that can reduce costs and improve the efficiency of operations.
Data
Collection of facts and figures used for analysis or a survey; a series of representations of various values of that quantity.
Quantitative Data
Numerical values are assigned to the characteristics or properties of objects or events, according to logically accepted rules.
Qualitative Data
The researcher takes into consideration the phenomenon as a whole and does not attempt to analyze it in measurable or quantifiable terms
Nominal Scale
Used when a set of objects among two or more categories is to be differentiated on the basis of certain clearly known characteristics
Ordinal Scale
Corresponds to quantitative classification of a set of objects done with the help of ranking on a continuum.
Interval Scale
Based on equal units of measurement; includes how much or how little of a given characteristic or attribute is present.
Ratio Scale
The highest level of measurement; assumes the existence of absolute zero.
Data Characteristics
Attributes include accuracy, completeness, consistency, timeliness, relevance, validity, and reliability.
Accuracy
Data should be free from errors and accurately reflect the real-world situation.
Completeness
Data should include all necessary information fields without missing values.
Timeliness
Data should be current and available when needed for decision-making.
Relevance
Data should be relevant to the specific task or analysis at hand, avoiding unnecessary or irrelevant information.
Reliability
Data should produce consistent results over time and be trustworthy.
Data collection
The process of collecting, measuring and analyzing different types of information using a set of standard validated techniques.
Primary Data
Data collected from first-hand experience directly from the main source.
Secondary data
Information that is been collected or gathered by some other researchers.
Observation Method
Data from the field is collected with the help of observation by the observer or by personally going to the field.
Participant Observation
The researcher actively engages in the daily activities of the subjects being studied.
Non-Participant Observation
The researcher observes subjects without becoming involved in their activities.
Questionnaire Method
A set of questions arranged logically, divided into groups, with the object of collecting information for research.
Schedule Method
The questionnaire but it filled by enumerator.
Survey method
A detailed inspection or investigation.
Panel Method
Data is collected from the same sample respondents at some interval either by mail or by personal interview.
Case Study Method
An intensive investigation of the particular unit under consideration..
Raw Data
Unprocessed data that has been collected and recorded directly from a source without any manipulation, organization, or analysis.
Contextualized Data
Refers to raw data that has been processed, organized, and enriched with additional context or meaning.
Metadata
Data about data
Descriptive Metadata
Aids users in finding, identifying, and selecting resources by describing them for search and discovery.
Structural Metadata
Describes the structure, type, and relationships of data.
Administrative Metadata
Carries technical details about a file or resource and is crucial for its identification, presentation, and preservation.