1/32
Flashcards covering the fundamentals of data management, database structures, big data attributes, and various analytical tools like SQL, ETL, and Hadoop.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Data Management (DM)
According to Gartner Group, it consists of the practices, architectural techniques, and tools for achieving consistent access to and delivery of data across the spectrum of data subject areas and data structure types in the enterprise.
Data
Raw, unorganized facts that need context and processing to become useful.
Information
Data that has been processed, organized, and structured into a context that gives it meaning.
Database Management Systems (DBMS)
Software systems used to create and manage databases where data are stored in computer files called tables.
Tables
Storehouses of data within a database consisting of records (rows) and fields (columns).
Record
A row within a database table.
Field
A column within a database table.
Primary key
A field in a database table that uniquely identifies a record in the table, such as a unique student ID number.
Foreign key
A field in a database table that provides a link between two tables in a relational database.
Schema
The organization or layout of a database that defines the tables, fields, constraints, and keys; it serves as the blueprint of the database.
Big Data
Large and expansive collected data sets from various sources like smartphone metadata, social media, and internet usage records that are sifted for patterns and trends.
Volume
One of the 4 Vs of big data referring to the main characteristic of the sheer mass of digital data that requires significant resources to manage.
Variety
One of the 4 Vs of big data referring to the fact that data comes in both structured and unstructured forms from various fragmented sources.
Veracity
One of the 4 Vs of big data referring to the quality and trustworthiness of the data, and whether it represents what it is believed to show.
Velocity
One of the 4 Vs of big data referring to the accelerating speed at which data is produced over a given time period.
Data mining (Data discovery)
The examination of huge sets of data to find patterns, connections, outliers, and hidden relationships to help businesses make informed decisions.
Structured data
Data that resides in fixed formats, is typically well-labeled with traditional fields and records, and fits easily into relational databases.
Unstructured data
Unorganized data that cannot be easily read by a computer because it is not stored in traditional rows and columns; it accounts for 80% of all data.
Semi-structured data
Data that contains both structured elements and unstructured components, with common examples being email and HTML files.
Business Intelligence (BI)
A broad range of tools and practices that extract, analyze, and report information to assist in critical business strategic decision-making and pattern prediction.
Data lakes
Storage systems that hold large amounts of unstructured data in their raw form to allow for flexible analysis.
Structured Query Language (SQL)
The most widely used standard computer language for relational databases, used by programmers to manipulate and query data tables.
Data Warehouse
A digital location used to consolidate disparate data from across an entire enterprise; can hold up to yottabytes of data.
Data Mart
A smaller data set or storage system designed to support the specific needs of a single department, such as sales or human resources.
Terabyte
A unit of data storage equal to 1,000gigabytes.
Yottabyte
A unit of data storage equal to one trillion terabytes (1×1012terabytes).
ETL
An acronym for extract, transform, and load; describing tools used to standardize data across systems and move it into a central location.
Extract (ETL)
The first step in ETL where data is taken from its original source, such as CRM or ERP systems.
Transform (ETL)
The second step in ETL where extracted data is reformatted or cleaned (e.g., removing decimals) to fit into a structured database table.
Load (ETL)
The final step in ETL where transformed data is transferred into the data warehouse or data mart.
Hadoop
An infrastructure for storing and processing large, distributed sets of unstructured and semi-structured data across multiple servers.
Tableau
A business intelligence tool that produces interactive data visualizations, such as graphs and charts, to simplify raw data into information.
Dashboard
A tool used for the presentation of complex data in an easier-to-understand visual form, often providing an overall picture of business efficiency.