1/39
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
big data
enormous (terabytes or more), complex (sensor data to social media data), traditional processes incapable of dealing with them
big data uses
improve day-to-day operations, planning, + decision making
key characteristics of big data
volume, velocity, value, variety, veracity
technologies used to manage + process big data
data warehouses, extract transform load process, data marts, data lakes, NoSQL databases, Hadoop, in-memory databases
online transaction processing (OLTP) systems
traditionally used to capture, do not support data analysis required today
data warehouses + data marts
allows organizations to access OLTP data, support decision making more effectively
data warehouse
large database, holds business information from many sources in the enterprise, covers all aspects of the company’s processes, products, + customers
extract transform load (ETL) process
extract data from a variety of sources, edits + transforms data into a data warehouse format, loads data into warehouse
data mart
subset of a data warehouse, used by small/medium sized businesses+ departments within large companies, supports decision making
data lake
takes a ‘store everything’ approach to big data, saves all data in its raw + unaltered form
NoSQL database
data modifed without two-dimensional tabular relations, uses horizontal scaling, does not require a predefined schema or conform to true ACID properties
structures used by NoSQL databases
more flexible than relational database tables, provide improved access speed + redundancy
categories of NoSQL databases
key value (two columns, key + value)
document (store, retrieve, + manage document-oriented information)
graph (well-suited for analyzing interconnection)
column (store data in columns)
Hadoop
open source software framework, includes several software models, stores + processes extremely large data sets
distributed file system (HDFS)
used for data storage, divides the data into subset, distributes teh subset onto different servers for processing
map reduce program
composite program, two components (map produced performs filtering + storing, reduce method performs a summary operation)
limitation → can only perform batch processesi
in-memory database
stores the entire database in RAM, foster access to data, enables the analysis of big data + other challenging data-processing applications
feasibility due to two factors → increase in RAM capacities, corresponding decrease in RAM costs
business intelligence (BI)
wide range of applications, practices, + technologies, extracts, transforms, integrates, visualizes, analyzes, interprets, + presents data, supports improved decision making
analytics
extensive use of data + quantitative analysis, supports fact-based desicion making with organizations
benefits of bi + analytics
detect fraud, improve forecasting, increase sales, optimize operations, reduce costs
data scientist
delivers real improvements in decision making, highly inquisitive person, strong business accumen, underatnds analytics
components for effective analytics + bi
exsistence of a solid data management program (includes governance), creative data scientists, strong commitment to data-driven decision making
descriptive analytics
preliminary data processing stage, identifies data patterns, answers questions
visual analytics
presentation of data pictorially or graphically
word cloud
visual depiction of a set of words (grouped together by frequency)
conversion funnel
graphical representaion (e.g. step summary)
regression analytics
determines the relationship between a dependent variable + one or more independent variables, produces a regression equation
predictive analytics
techniques to analyze current data, identifies future probabilities + trends, makes predictions
time series analysis
uses statistical methods, analyzes time series data, extracts meaningful statistics + characteristics
data mining
explores large amounts of data for hidden patterns, predicts future trends + behavior, used in decision making
techniques → association analysis, neural computing, case-based reasoning
optimination
allocate scare resources to minimize costs + maximize efforts
genetic algorithm
emplyes a natural-selection like process, finds approximate solutions to optimization + search problems
linear programming
finds the optimum value of a linear expression, calculated based on the value of a set of decision variables (variables subject to a set of constraints)
simulation
emulates the dynamic repsonse of a real-world system to various inputs
scenario analysis
predicts future values based on current potential events
monte carlo simulation
provides a spectrum of thousands of possible outcomes, considers variables, + range of potential values
text + video analysis
clean insights + data relevant to decision making
text analysis
process for extracting values from large quantities of unstructured text data
video analysis
process of obtaining information/insights from video footage
self-service analytics
training, techniques, + processes, empower end users to work independently (access data from approved sources, perform own analysis, use an endorsed set of tools)
advantages → get valuable data to end users, accelerates decision making, fact-based decision making, soltutioion to data scientist shortage