1/73
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
What is a relational database?
A structured database that stores data in tables (rows and columns) with defined relationships using keys (primary/foreign keys). It supports SQL and enforces schema consistency.
What is a non-relational (NoSQL) database?
A flexible database that does not require fixed schemas; stores data in formats like document, key-value, graph, or columnar structures. Optimized for scalability and unstructured/semi-structured data.
What is a .csv file used for?
Stores tabular data in plain text using commas as delimiters; commonly used for data import/export.
What is a .json file used for?
Stores semi-structured data using key-value pairs and nested objects; commonly used in APIs and web applications.
What is a .xlsx file?
Microsoft Excel file used for spreadsheets, calculations, and structured tabular data with formulas.
What is a .txt file?
Plain text file with no formatting; used for logs, notes, or raw data.
What is a .dat file?
Generic data file that can store structured or unstructured data depending on the application.
What is a .jpg file in data contexts?
Binary image file; represents unstructured data used in machine learning or image processing.
What is structured data?
Highly organized data stored in rows/columns with a fixed schema (e.g., SQL tables).
What is semi-structured data?
Data that does not follow a rigid schema but has organizational markers like tags or keys (e.g., JSON, XML).
What is unstructured data?
Data with no predefined format (e.g., images, videos, emails, social media posts).
What is a fact table?
A table that stores measurable metrics (e.g., sales, revenue, quantity) used for analysis.
What is a dimensional table?
A table that stores descriptive attributes (e.g., customer name, product type, date).
What is a schema?
The blueprint of a database defining tables, fields, relationships, and constraints.
What is a slowly changing dimension (SCD)?
A dimension that tracks historical changes in data attributes over time (e.g., customer address changes).
What is a bridge table?
A table used to resolve many-to-many relationships between two entities.
What is nested JSON data?
JSON objects contained within other objects, allowing hierarchical data representation.
What is a string data type?
Text data (char, varchar, nvarchar) used for names, descriptions, and labels.
What is a numeric data type?
Data used for calculations (integers, decimals, floats).
What is a boolean data type?
Logical data type representing True/False values.
What is a datetime data type?
Stores date and time values, often used for timestamps.
What is a NULL value?
Represents missing, unknown, or undefined data.
What is a GUID/UUID?
A globally unique identifier used to uniquely label records across systems.
What is a BLOB?
Binary Large Object used to store large binary files like images or multimedia.
What is a CLOB?
Character Large Object used to store large text data (documents, logs).
What is a database as a data source?
A structured system storing organized data accessible via queries (SQL/NoSQL).
What is an API data source?
A system that allows applications to exchange data using endpoints (often JSON/XML).
What is website data?
Data scraped or collected from web pages (HTML content, tables, metadata).
What are logs as a data source?
System-generated records of events (errors, user activity, system performance).
What is a data warehouse?
Centralized repository for structured, historical data optimized for analytics.
What is a data lake?
Storage system for raw structured, semi-structured, and unstructured data.
What is a data mart?
A subset of a data warehouse focused on a specific business area.
What is a data silo?
Isolated data stored separately and not easily accessible across systems.
What is cloud computing?
Delivery of computing resources (storage, servers, databases) over the internet.
What are the major cloud providers?
AWS, Microsoft Azure, Google Cloud Platform (GCP).
What is a public cloud?
Cloud infrastructure shared across multiple organizations.
What is a private cloud?
Cloud infrastructure dedicated to a single organization.
What is a hybrid cloud?
Combination of public and private cloud environments.
What is object storage?
Storage method that stores data as objects with metadata (e.g., images, backups).
What is block storage?
Storage that splits data into fixed-size blocks for high-performance access.
What is file storage?
Traditional hierarchical storage using folders and files.
What is containerization?
Packaging applications with dependencies into isolated containers (e.g., Docker).
What is an IDE?
Software used to write and run code (e.g., VS Code, RStudio).
What are notebooks used for?
Interactive coding environments for data analysis (e.g., Jupyter Notebook).
What is Tableau / Power BI used for?
Business intelligence tools for dashboards and data visualization.
What is SQL used for?
Querying and managing relational databases.
What is Python used for in data analysis?
Data manipulation, automation, analysis, and machine learning.
What is pandas?
Python library for data manipulation and analysis (DataFrames).
What is generative AI?
AI that creates new content (text, images, code) based on learned patterns.
What is a large language model (LLM)?
AI model trained on large text datasets to understand and generate human language.
What is NLP?
Natural Language Processing—AI that interprets and processes human language.
What is deep learning?
Machine learning using neural networks with multiple layers.
What is RPA?
Robotic Process Automation—automates repetitive business tasks.
What is ETL?
Extract, Transform, Load—data is transformed before loading into a system.
What is ELT?
Extract, Load, Transform—data is loaded first, then transformed in the system.
What is data integration?
Combining data from multiple sources into a unified view.
What is data sampling?
Selecting a subset of data for analysis.
What is data aggregation?
Summarizing data (e.g., sum, average, count).
What is a join in SQL?
Combining tables based on related columns.
What is filtering in data queries?
Selecting rows that meet specific conditions.
What are missing values?
Data fields with no recorded value.
What is duplication in data?
Repeated records that can skew analysis.
What are outliers?
Data points significantly different from the rest.
What is data completeness?
Measure of whether all required data is present.
What is data validation?
Ensuring data meets defined rules and formats.